Skip to content

geotiff: reject mixed per-band SCALE/OFFSET under mask_and_scale (#2988)#2993

Merged
brendancol merged 4 commits into
mainfrom
issue-2988
Jun 6, 2026
Merged

geotiff: reject mixed per-band SCALE/OFFSET under mask_and_scale (#2988)#2993
brendancol merged 4 commits into
mainfrom
issue-2988

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2988

mask_and_scale=True read a multi-band raster with one (scale, offset) pair, falling back to band 0's per-band values when there was no dataset-level value. Bands with a different scale/offset came back numerically wrong while the attrs still looked valid, so downstream spatial functions had no signal that the data was corrupt.

This makes _extract_scale_offset band-aware:

  • A dataset-level SCALE/OFFSET still applies to the whole array.
  • Per-band values that agree across bands still apply to the whole array.
  • Per-band values that disagree now raise MixedBandMetadataError when no band is selected, mirroring the existing per-band nodata handling. Selecting a single band with band= reads it with that band's own scale/offset.

Both the eager (numpy) and dask read paths thread the selected band into the helper, so they reject and scale identically.

Backend coverage: CPU eager (numpy) and dask. mask_and_scale already raises ValueError with gpu=True and on .vrt sources, so cupy / VRT paths are unaffected.

Test plan:

  • Mixed per-band SCALE raises on eager and dask
  • Mixed per-band OFFSET raises
  • band= selects that band's scale/offset (eager and dask agree)
  • Uniform per-band and dataset-level values still apply unchanged
  • Existing mask_and_scale tests still pass

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 6, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: geotiff: reject mixed per-band SCALE/OFFSET under mask_and_scale (#2988)

Blockers

None.

Suggestions

  • xrspatial/geotiff/_attrs.py:1541 -- the docstring says the band-selected case falls back to "the dataset-level value, then the default", but the code returns early on any dataset-level key (line 1555), so the per-band branch is only reached when there is no dataset-level value. When a band is selected and that band has no per-band entry, the code returns default, never a dataset-level value. Trim the docstring to "that band's per-band value, else the default".

Nits

  • xrspatial/geotiff/_attrs.py:1553 -- _resolve(name, dataset_key, default) is always called with name == dataset_key ('SCALE'/'SCALE', 'OFFSET'/'OFFSET'). The dataset_key parameter is redundant; drop it and use name for both lookups.

What looks good

  • The fix lands in the one shared helper both read paths call, and both eager and dask now pass band, so they reject and scale identically. The dask call resolves scale/offset at graph-build time, so the rejection raises eagerly rather than at compute, matching the eager path.
  • Distinctness is compared on coerced floats, not raw strings, so "2.0" and "2.00" don't trip a false mixed-metadata error.
  • Tests cover mixed SCALE, mixed OFFSET, band selection on both backends, uniform per-band, and the existing dataset-level cases still pass.
  • Reuses the existing MixedBandMetadataError, consistent with the per-band nodata rejection already in the module.

Checklist

  • NaN handling is correct for the masking path (unchanged)
  • Both implemented backends (eager, dask) produce consistent results
  • Edge cases covered by tests
  • No premature materialization
  • Docstring present; one overstated line noted above
  • README feature matrix -- not applicable, no new public function or backend change

Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after ee02548)

Both findings from the first pass are addressed:

  • Suggestion (docstring overstated the band-selected fallback): fixed. The docstring now reads "that band's per-band value (or the default when that band carries none)", which matches the code.
  • Nit (redundant dataset_key parameter on _resolve): fixed. _resolve(name, default) now uses name for both the dataset-level and per-band lookups.

No new findings. The 12 mask_and_scale tests still pass. Nothing else outstanding.

# Conflicts:
#	xrspatial/geotiff/_attrs.py
#	xrspatial/geotiff/tests/read/test_rioxarray_compat_2961.py
@brendancol brendancol merged commit f4dca89 into main Jun 6, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mask_and_scale=True silently corrupts multi-band rasters with differing per-band SCALE/OFFSET

1 participant