spline: upload invariant point/weight data to GPU once on dask+cupy path by brendancol · Pull Request #2929 · xarray-contrib/xarray-spatial

brendancol · 2026-06-04T15:47:23Z

What

The thin plate spline interpolator's dask+cupy backend re-uploaded the input point coordinates and the solved TPS weight vector to the GPU once per dask chunk. Those arrays are the same for every chunk, so a raster split into N tiles copied them across PCIe N times.

This adds _tps_evaluate_gpu, which takes the point and weight arrays already on the device and uploads only the per-chunk grid slices. _spline_cupy uploads once then delegates; _spline_dask_cupy uploads the invariants once before the per-chunk closure and reuses them.

Backend coverage

numpy: unchanged
cupy (single array): unchanged behaviour, now routes through the shared GPU evaluator
dask+numpy: unchanged
dask+cupy: invariant uploads drop from 3 per chunk to 3 total

Verification

dask+cupy invariant point/weight uploads on a 16-chunk grid: 48 before, 3 after
numpy / cupy / dask+cupy parity holds to ~1e-14
New regression test asserts the dask+cupy path uploads the point/weight arrays exactly three times regardless of chunk count; it fails against the old code (48 != 3) and passes with the fix
Added cupy and dask+cupy parity tests for spline (the suite previously had neither)

Test plan

pytest xrspatial/tests/test_interpolation.py (44 passed)
pytest xrspatial/tests/test_interpolation.py::TestSpline with real cupy on a CUDA host (8 passed)
Regression test confirmed red without the source fix

Part of a deep-sweep performance audit (scope=spline-only). No issue was filed for this finding; it was surfaced directly by the audit.

The dask+cupy backend re-uploaded the input point coordinates and the solved TPS weight vector to the GPU once per chunk. These arrays are the same for every chunk, so a raster split into N tiles copied them across PCIe N times. Add _tps_evaluate_gpu, which takes already-on-device point/weight arrays and uploads only the per-chunk grid slices. _spline_cupy uploads once then delegates; _spline_dask_cupy uploads the invariants once before the per-chunk closure and reuses them. Verified: dask+cupy invariant uploads drop from 3*n_chunks to 3 (16x on a 16-chunk grid); numpy/cupy/dask+cupy parity holds to ~1e-14. Adds a regression test plus cupy and dask+cupy parity tests for spline. scope=spline-only

brendancol

PR Review: spline upload invariant point/weight data to GPU once on dask+cupy path

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

The _chunk closure in _spline_dask_cupy (xrspatial/interpolate/_spline.py:229) now captures the cupy device arrays x_gpu, y_gpu, w_gpu. Under the threaded or synchronous scheduler this is the intended win: the buffers are shared by reference and uploaded once. Under a distributed scheduler the closure is pickled per task, and pickling a cupy array round-trips through host memory, which would reintroduce a per-task transfer. The previous code uploaded inside the worker, so this is not a regression for the common single-GPU path. A one-line comment noting the threaded-scheduler assumption would help the next reader.

Nits (optional improvements)

None.

What looks good

The refactor is behavior-preserving: _tps_evaluate_gpu is the old _spline_cupy body minus the three invariant uploads, which moved to the callers. The CUDA kernel and its bounds guard are untouched.
Verified numerically: numpy, cupy, and dask+cupy agree to ~1e-14.
The regression test asserts exactly three invariant uploads regardless of chunk count, and it fails against the pre-fix code (48 != 3), so it actually guards the fix.
The PR also adds cupy and dask+cupy parity tests for spline, which the suite did not have before.

Checklist

Algorithm matches reference: no algorithm change, evaluation math identical
All implemented backends produce consistent results: verified to ~1e-14
NaN handling is correct: unchanged
Edge cases are covered by tests: single point, n<3, memory guard already covered
Dask chunk boundaries handled correctly: grid slices unchanged, per-chunk shape preserved
No premature materialization or unnecessary copies: this is the fix
Benchmark exists or is not needed: no benchmark; micro-optimization on an existing path
README feature matrix updated (if applicable): not applicable, no API change
Docstrings present and accurate: new helper documented

…review follow-up)

brendancol

PR Review (follow-up pass)

The earlier suggestion is addressed: _spline_dask_cupy now carries a comment spelling out the threaded/synchronous-scheduler assumption behind the closure capture, and notes that a distributed scheduler is no worse than the previous per-chunk upload.

No remaining blockers, suggestions, or nits. Spline tests stay green (8 passed in TestSpline, including the cupy and dask+cupy parity tests and the upload-count regression).

github-actions Bot added the performance PR touches performance-sensitive code label Jun 4, 2026

brendancol commented Jun 4, 2026

View reviewed changes

Note threaded-scheduler assumption in spline dask+cupy upload hoist (…

fc2a862

…review follow-up)

brendancol commented Jun 4, 2026

View reviewed changes

sweep-performance: record interpolate_spline audit (scope=spline-only)

8eb1eca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spline: upload invariant point/weight data to GPU once on dask+cupy path#2929

spline: upload invariant point/weight data to GPU once on dask+cupy path#2929
brendancol wants to merge 3 commits into
xarray-contrib:mainfrom
brendancol:deep-sweep-performance-interpolate_spline-2026-06-04

brendancol commented Jun 4, 2026

Uh oh!

brendancol left a comment

Uh oh!

brendancol left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented Jun 4, 2026

What

Backend coverage

Verification

Test plan

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

PR Review: spline upload invariant point/weight data to GPU once on dask+cupy path

Blockers (must fix before merge)

Suggestions (should fix, not blocking)

Nits (optional improvements)

What looks good

Checklist

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

PR Review (follow-up pass)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant