geotiff: fail closed on malformed GDAL_METADATA XML under mask_and_scale#2999
Merged
Conversation
…ale (#2998) _parse_gdal_metadata swallowed ET.ParseError and returned {}, so a file with a corrupt GDAL_METADATA XML payload read as raw, unscaled pixels under mask_and_scale=True instead of failing. PRs #2988/#2992 closed the same silent wrong-pixels risk for malformed numeric SCALE/OFFSET values; this closes it for the XML container itself. - _parse_gdal_metadata_strict reports a malformed-XML flag separately from the dict; the DOCTYPE/billion-laughs drop stays silent. - GeoInfo.gdal_metadata_malformed carries the flag to the consumer (inherited by overview IFDs alongside the XML). - _extract_scale_offset raises MalformedScaleOffsetError when the flag is set, on both the eager and dask mask_and_scale paths. - Reads that don't request mask_and_scale are unchanged. Adds eager + dask rejection tests and a no-op-without-mask_and_scale test for malformed XML.
brendancol
commented
Jun 6, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review: fail closed on malformed GDAL_METADATA XML under mask_and_scale
Blockers
None.
Suggestions
None blocking. Splitting _parse_gdal_metadata_strict (returns the malformed flag) from the thin _parse_gdal_metadata wrapper keeps the existing callers and the DOCTYPE security test unchanged. Right call.
Nits
_attrs.py:_extract_scale_offsetraises onmalformed=Truebefore theif not gdal_metadatashort-circuit. The order is correct, since a malformed payload yields an empty dict and the malformed check has to run first. A one-line comment noting the order matters would protect against a later refactor that moves the empty-dict guard up and quietly silences the rejection.
What looks good
- The fix targets the actual gap:
ParseErroris now separated from the DOCTYPEValueErrordrop, so the billion-laughs guard stays silent while genuinely corrupt XML fails closed undermask_and_scale. - The malformed flag rides on
GeoInfoand is inherited by overview IFDs alongside the XML it describes (_geotags.pyoverview-inheritance block), so a COG overview that inherits a corrupt base payload also fails closed. - Both
mask_and_scaleconsumer sites are covered (eager_finalize_eager_readand the dask path). VRT does not consume_extract_scale_offset, so there is no third site to wire. - Reads without
mask_and_scaleare untouched: the flag is only read inside themask_and_scalebranch. - Tests cover eager + dask rejection and the no-op-without-
mask_and_scalecase. The malformed XML is injected through the rawgdal_metadata_xmlwriter kwarg because the writer escapes normal values.
Checklist
- Algorithm matches reference: n/a (bug fix)
- Implemented backends consistent: yes (eager + dask both raise)
- NaN handling: unaffected
- Edge cases covered by tests: yes (malformed XML, with and without
mask_and_scale, eager and dask) - Dask chunk boundaries: unaffected (rejection at graph-build, before chunking)
- No premature materialization: yes
- Benchmark: not needed
- README feature matrix: not needed (no new API)
- Docstrings present and accurate: yes
brendancol
commented
Jun 6, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
Follow-up review (after 097d32c)
The one nit from the previous pass is addressed: _extract_scale_offset now carries a comment explaining why the malformed check must stay ahead of the empty-dict short-circuit.
No remaining findings. Comment-only change; the malformed/mask_and_scale tests still pass (19 passed).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2998
What
Under
mask_and_scale=True, a GeoTIFF whoseGDAL_METADATAtag holds non-well-formed XML now fails closed withMalformedScaleOffsetErrorinstead of reading raw, unscaled pixels._parse_gdal_metadata()caughtET.ParseErrorand returned{}, which_extract_scale_offset()then read as identity scaling. A scale/offset hidden inside a corrupt XML payload was lost silently. PRs #2988 / #2992 closed the same risk for malformed numeric SCALE/OFFSET; this covers the XML container itself._parse_gdal_metadata_strict()returns(dict, malformed)._parse_gdal_metadata()keeps its dict-only contract by discarding the flag, so the existing callers and the DOCTYPE/billion-laughs security drop are unchanged (that payload is still refused silently, on purpose).GeoInfo.gdal_metadata_malformedcarries the flag to the consumer and is inherited by overview IFDs alongside the XML._extract_scale_offset()raisesMalformedScaleOffsetErrorwhen the flag is set.Backend coverage
The rejection lands on both
mask_and_scaleconsumer sites: the eager path (_finalize_eager_read, used by numpy / cupy / GPU) and the dask path. numpy, cupy, dask+numpy, and dask+cupy all route through one of the two.Test plan
MalformedScaleOffsetErrorchunks=2) of malformed XML raisesMalformedScaleOffsetErrormask_and_scaleis unchanged (no rejection){}(silent drop)