Skip to content

geotiff: fail closed on malformed GDAL_METADATA XML under mask_and_scale#2999

Merged
brendancol merged 2 commits into
mainfrom
issue-2998
Jun 6, 2026
Merged

geotiff: fail closed on malformed GDAL_METADATA XML under mask_and_scale#2999
brendancol merged 2 commits into
mainfrom
issue-2998

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #2998

What

Under mask_and_scale=True, a GeoTIFF whose GDAL_METADATA tag holds non-well-formed XML now fails closed with MalformedScaleOffsetError instead of reading raw, unscaled pixels. _parse_gdal_metadata() caught ET.ParseError and returned {}, which _extract_scale_offset() then read as identity scaling. A scale/offset hidden inside a corrupt XML payload was lost silently. PRs #2988 / #2992 closed the same risk for malformed numeric SCALE/OFFSET; this covers the XML container itself.

  • _parse_gdal_metadata_strict() returns (dict, malformed). _parse_gdal_metadata() keeps its dict-only contract by discarding the flag, so the existing callers and the DOCTYPE/billion-laughs security drop are unchanged (that payload is still refused silently, on purpose).
  • GeoInfo.gdal_metadata_malformed carries the flag to the consumer and is inherited by overview IFDs alongside the XML.
  • _extract_scale_offset() raises MalformedScaleOffsetError when the flag is set.

Backend coverage

The rejection lands on both mask_and_scale consumer sites: the eager path (_finalize_eager_read, used by numpy / cupy / GPU) and the dask path. numpy, cupy, dask+numpy, and dask+cupy all route through one of the two.

Test plan

  • Eager read of malformed XML raises MalformedScaleOffsetError
  • Dask read (chunks=2) of malformed XML raises MalformedScaleOffsetError
  • Read without mask_and_scale is unchanged (no rejection)
  • Existing DOCTYPE security test still returns {} (silent drop)
  • Full geotiff suite green (6206 passed, 81 skipped, 1 xfailed)

…ale (#2998)

_parse_gdal_metadata swallowed ET.ParseError and returned {}, so a file
with a corrupt GDAL_METADATA XML payload read as raw, unscaled pixels
under mask_and_scale=True instead of failing. PRs #2988/#2992 closed the
same silent wrong-pixels risk for malformed numeric SCALE/OFFSET values;
this closes it for the XML container itself.

- _parse_gdal_metadata_strict reports a malformed-XML flag separately
  from the dict; the DOCTYPE/billion-laughs drop stays silent.
- GeoInfo.gdal_metadata_malformed carries the flag to the consumer
  (inherited by overview IFDs alongside the XML).
- _extract_scale_offset raises MalformedScaleOffsetError when the flag
  is set, on both the eager and dask mask_and_scale paths.
- Reads that don't request mask_and_scale are unchanged.

Adds eager + dask rejection tests and a no-op-without-mask_and_scale
test for malformed XML.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 6, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: fail closed on malformed GDAL_METADATA XML under mask_and_scale

Blockers

None.

Suggestions

None blocking. Splitting _parse_gdal_metadata_strict (returns the malformed flag) from the thin _parse_gdal_metadata wrapper keeps the existing callers and the DOCTYPE security test unchanged. Right call.

Nits

  • _attrs.py: _extract_scale_offset raises on malformed=True before the if not gdal_metadata short-circuit. The order is correct, since a malformed payload yields an empty dict and the malformed check has to run first. A one-line comment noting the order matters would protect against a later refactor that moves the empty-dict guard up and quietly silences the rejection.

What looks good

  • The fix targets the actual gap: ParseError is now separated from the DOCTYPE ValueError drop, so the billion-laughs guard stays silent while genuinely corrupt XML fails closed under mask_and_scale.
  • The malformed flag rides on GeoInfo and is inherited by overview IFDs alongside the XML it describes (_geotags.py overview-inheritance block), so a COG overview that inherits a corrupt base payload also fails closed.
  • Both mask_and_scale consumer sites are covered (eager _finalize_eager_read and the dask path). VRT does not consume _extract_scale_offset, so there is no third site to wire.
  • Reads without mask_and_scale are untouched: the flag is only read inside the mask_and_scale branch.
  • Tests cover eager + dask rejection and the no-op-without-mask_and_scale case. The malformed XML is injected through the raw gdal_metadata_xml writer kwarg because the writer escapes normal values.

Checklist

  • Algorithm matches reference: n/a (bug fix)
  • Implemented backends consistent: yes (eager + dask both raise)
  • NaN handling: unaffected
  • Edge cases covered by tests: yes (malformed XML, with and without mask_and_scale, eager and dask)
  • Dask chunk boundaries: unaffected (rejection at graph-build, before chunking)
  • No premature materialization: yes
  • Benchmark: not needed
  • README feature matrix: not needed (no new API)
  • Docstrings present and accurate: yes

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after 097d32c)

The one nit from the previous pass is addressed: _extract_scale_offset now carries a comment explaining why the malformed check must stay ahead of the empty-dict short-circuit.

No remaining findings. Comment-only change; the malformed/mask_and_scale tests still pass (19 passed).

@brendancol brendancol merged commit 7ba2207 into main Jun 6, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mask_and_scale=True silently ignores malformed GDAL_METADATA XML

1 participant