Skip to content

geotiff: reject malformed SCALE/OFFSET under mask_and_scale#2992

Merged
brendancol merged 6 commits into
mainfrom
issue-2987
Jun 6, 2026
Merged

geotiff: reject malformed SCALE/OFFSET under mask_and_scale#2992
brendancol merged 6 commits into
mainfrom
issue-2987

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2987

_extract_scale_offset returned the 1.0 / 0.0 identity default whenever a SCALE or OFFSET value failed to parse as a float, so a malformed item read the same as a missing one. Under mask_and_scale=True the caller asked the reader to honour the metadata, and the silent fallback let the raw unscaled pixels through as if the file were clean.

Changes:

  • _extract_scale_offset now raises the new MalformedScaleOffsetError (a GeoTIFFAmbiguousMetadataError/ValueError subclass) when a SCALE or OFFSET key is present but unparseable. An absent key still returns the identity default, so a source with no scale/offset reads as before.
  • The error names the offending key and value.
  • Added the error to the geotiff package exports and a note to the reference docs.

Both mask_and_scale call sites (eager _finalize_eager_read and the dask backend) reach _extract_scale_offset during graph assembly, so the rejection fires before compute on both. Reads without mask_and_scale never touch the metadata and are unaffected.

Test plan:

  • test_mask_and_scale_malformed_scale_raises / _offset_raises (eager)
  • test_mask_and_scale_malformed_scale_dask_raises (dask)
  • test_malformed_scale_ignored_without_mask_and_scale (no opt-in, no rejection)
  • Existing test_rioxarray_compat_2961.py suite still green (29 passed)

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 6, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: geotiff: reject malformed SCALE/OFFSET under mask_and_scale

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • xrspatial/geotiff/_errors.py: the hierarchy tree in the module docstring (Exception -> GeoTIFFAmbiguousMetadataError -> ...) already drops several existing subclasses, so leaving MalformedScaleOffsetError out of it stays consistent with what's there. The tree is already stale, so adding the new class is optional cleanup, not a fix.

What looks good

  • The change is confined to the one helper the finding named, which keeps the conflict surface small against the sibling PR editing the same function.
  • The absent-key vs present-but-malformed split is handled right. The loop only inspects a key when it's actually in the dict and returns the identity default only when no key is present. A malformed dataset-level SCALE raises instead of falling through to the per-band key, which matches the old code: on a parse failure it returned the default, never the per-band value.
  • The new error subclasses GeoTIFFAmbiguousMetadataError/ValueError, so existing except ValueError callers keep catching it. Same shape as the InvalidIntegerNodataError precedent for this class of silent-coercion bug.
  • The message names the offending key and value.
  • Both call sites hit _extract_scale_offset at graph-assembly time, so the dask path fails closed before compute rather than lazily. The dask test confirms it.
  • Tests cover eager SCALE, eager OFFSET, dask, and the no-opt-in case, with uniquely named temp files carrying the issue number.

Checklist

  • Algorithm matches reference: n/a (validation guard, no numerical algorithm)
  • Backends consistent: eager and dask both raise; GPU mask_and_scale is already rejected upstream
  • NaN handling: untouched by this PR
  • Edge cases tested: absent key, present-malformed SCALE, present-malformed OFFSET, no opt-in
  • Dask chunk boundaries: n/a, the raise is at graph build
  • No premature materialization: confirmed, no new array ops
  • Benchmark: not needed (error path)
  • README feature matrix: n/a (no new function, no backend change)
  • Docstrings present and accurate: yes, on the new error class and the helper

Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after nit fix)

The one nit from the first pass is resolved: MalformedScaleOffsetError is now listed in the _errors.py hierarchy docstring tree. I left the other pre-existing subclasses missing from that tree alone, since backfilling them is unrelated to this PR.

No new findings. No blockers, no open suggestions. The change stays confined to the malformed-value handling in _extract_scale_offset plus the new error class and its exports.

@brendancol brendancol merged commit 8e8482e into main Jun 6, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: malformed SCALE/OFFSET silently ignored under mask_and_scale

1 participant