[Python] Add table handles and snapshot object model by zacdav-db · Pull Request #862 · delta-io/delta-sharing

zacdav-db · 2026-03-11T02:55:37Z

Summary:
Add the core object model for Python snapshot reads.

This PR introduces SharingClient.table, DeltaSharingTable, TableSnapshot, table.snapshot, and table.to_pandas as a pure full-snapshot materializer.

Scope:

table handles
snapshot configuration
pandas materialization
legacy parity coverage for load_as_pandas vs table.snapshot(...).to_pandas
minimal pandas quickstart and README updates

Not included:

Arrow APIs
lazy batch readers
object-based CDF APIs
DuckDB examples

Testing:

py_compile on touched Python files
focused pytest subset for table creation, snapshot wiring, parity, and direct real-table snapshot reads

Part of #860.

zacdav-db · 2026-03-19T05:33:35Z

@PatrickJin-db this is the first smaller PR in sequence for changes to add clearer UI/UX + arrow. Referencing goals of #860

linzhou-db

Looking good!
need @PatrickJin-db to add a unit test on real table.

Or could you try to add one based on existing tables? and let Patrick try to run the test locally.

linzhou-db · 2026-04-09T06:58:54Z

    ).to_pandas()


+class DeltaSharingSnapshot:


What if we call it TableSnapshot?

sgtm, will make this change.

zacdav-db · 2026-04-10T01:54:40Z

Addressed the changes for DeltaSharingSnapshot --> TableSnapshot.

Also added a test.

PatrickJin-db

overall direction looks good. this same interface can also be used for polars in the future.

will try running the tests locally tomorrow.

PatrickJin-db · 2026-04-10T03:47:51Z

+        version: Optional[int] = None,
+        timestamp: Optional[str] = None,
+        use_delta_format: Optional[bool] = None,
+        convert_in_batches: bool = False,


I took a look at the larger PR (#861) and it seems like convert_in_batches is only used by to_pandas and not to_arrow. If you don't plan to use convert_in_batches in to_arrow, then I think it makes more sense to have it be an argument of to_pandas rather than a field of TableSnapshot.

PatrickJin-db · 2026-04-10T03:52:36Z

-# Fetch 10 rows from a table and convert it to a Pandas DataFrame. This can be used to read sample data from a table that cannot fit in the memory.
-print("########### Loading 10 rows from delta_sharing.default.owid-covid-data as a Pandas DataFrame #############")
-data = delta_sharing.load_as_pandas(table_url, limit=10)
+# Configure a scan and fetch 10 rows from a table as a Pandas DataFrame.


nit: preserve the original comment

PatrickJin-db · 2026-04-10T03:55:33Z

+            convert_in_batches=self._convert_in_batches,
+        )
+
+    def to_pandas(self) -> pd.DataFrame:


let's also add to_spark

zacdav-db · 2026-04-10T04:00:01Z

overall direction looks good. this same interface can also be used for polars in the future.

will try running the tests locally tomorrow.

Thanks. I'll move some of the changes in secondary PRs into this one based on the feedback.

Signed-off-by: Zac Davies <zachary.davies+data@databricks.com>

(cherry picked from commit 4df37d7)

Signed-off-by: Zac Davies <zachary.davies+data@databricks.com>

zacdav-db · 2026-04-10T04:47:19Z

Given the request for more of the to_<library> methods I've pulled forward the to_arrow code and tests. CDF is still in next PR.

zacdav-db · 2026-04-22T04:13:47Z

@PatrickJin-db let me know if you've had time to test locally or if there is anything I can do to help move things along.

PatrickJin-db · 2026-05-05T03:37:30Z

@zacdav-db Sorry for the wait. A few general asks I have are:

Can we keep the interfaces changes separate from the implementation of to_arrow? My recommendation is a) new TableSnapshot interface (perhaps leave out to_arrow for now), with some basic tests ensuring the new to_pandas and to_spark methods work, followed by b) implementing to_arrow, to_record_batches, etc with both unit and integration tests making sure it works end-to-end.
We also recently migrated this repo to use uv for dependency management. Can you add any required dependencies to pyproject.toml?

Also, you should be able to run unit tests locally. I am only required to run tests if they are integration tests (those marked by SKIP_INTEGRATION).

PatrickJin-db · 2026-05-05T03:41:30Z

+        captured["convert_in_batches"] = self._convert_in_batches
+        return expected
+
+    monkeypatch.setattr("delta_sharing.delta_sharing.DeltaSharingReader.to_arrow", fake_to_arrow)


doesn't this monkeypatch kind of defeat the purpose of this test?

PatrickJin-db · 2026-05-05T03:42:32Z

+        captured["convert_in_batches"] = self._convert_in_batches
+        return expected
+
+    monkeypatch.setattr("delta_sharing.delta_sharing.DeltaSharingReader.to_pandas", fake_to_pandas)


same here. In general I'd prefer not using monkeypatch for unit tests, and keeping the server response and parquet file data as the only things we mock.

Add Python table handles and snapshot object model

b1640a1

zacdav-db mentioned this pull request Mar 19, 2026

[Python] Add table-handle, Arrow, and CDF object APIs #861

Closed

zacdav-db mentioned this pull request Apr 7, 2026

Python connector: add first-class table handles and object-based Arrow/CDF read APIs #860

Open

linzhou-db requested review from PatrickJin-db and linzhou-db April 9, 2026 06:57

linzhou-db reviewed Apr 9, 2026

View reviewed changes

Address PR review for snapshot table handle

f88ae95

PatrickJin-db reviewed Apr 10, 2026

View reviewed changes

Zac Davies added 4 commits April 10, 2026 14:11

Preserve original quickstart pandas comments

533d639

Signed-off-by: Zac Davies <zachary.davies+data@databricks.com>

Move convert_in_batches to pandas materialization

69efb78

Signed-off-by: Zac Davies <zachary.davies+data@databricks.com>

Add Arrow-native snapshot materializers

583e7d8

(cherry picked from commit 4df37d7)

Add Spark materializers for table snapshots

12764b0

Signed-off-by: Zac Davies <zachary.davies+data@databricks.com>

PatrickJin-db reviewed May 5, 2026

View reviewed changes

Conversation

zacdav-db commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zacdav-db commented Mar 19, 2026

Uh oh!

linzhou-db left a comment

Choose a reason for hiding this comment

Uh oh!

linzhou-db Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

zacdav-db Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

zacdav-db commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PatrickJin-db left a comment

Choose a reason for hiding this comment

Uh oh!

PatrickJin-db Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

PatrickJin-db Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

PatrickJin-db Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

zacdav-db commented Apr 10, 2026

Uh oh!

zacdav-db commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zacdav-db commented Apr 22, 2026

Uh oh!

PatrickJin-db commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PatrickJin-db May 5, 2026

Choose a reason for hiding this comment

Uh oh!

PatrickJin-db May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zacdav-db commented Mar 11, 2026 •

edited

Loading

zacdav-db commented Apr 10, 2026 •

edited

Loading

zacdav-db commented Apr 10, 2026 •

edited

Loading

PatrickJin-db commented May 5, 2026 •

edited

Loading