Skip to content

feat(parquet): expose ParquetReadOptions via registerParquet and readParquet#18

Merged
andygrove merged 6 commits into
apache:mainfrom
andygrove:register-parquet-options
May 13, 2026
Merged

feat(parquet): expose ParquetReadOptions via registerParquet and readParquet#18
andygrove merged 6 commits into
apache:mainfrom
andygrove:register-parquet-options

Conversation

@andygrove
Copy link
Copy Markdown
Member

Closes #17.

Exposes DataFusion's ParquetReadOptions on the Java surface and adds SessionContext.readParquet(...).

What's in this PR

  • org.apache.datafusion.ParquetReadOptions — mutable fluent class with five setters: fileExtension, parquetPruning, skipMetadata, metadataSizeHint, schema (Arrow Java Schema).
  • SessionContext.registerParquet(name, path, options) — overload of the existing method. Existing registerParquet(name, path) is preserved (delegates to the overload with default options).
  • SessionContext.readParquet(path[, options]) -> DataFrame — analog of Rust's ctx.read_parquet, returns a DataFrame without registering.
  • Schema is marshalled across JNI as Arrow IPC stream bytes (reuses the IPC mechanism added in feat(proto): execute Java-built protobuf plans via SessionContext #13's tableSchema).
  • 7 new tests: 3 unit tests for the options class, 4 integration tests (default options row count, custom file extension, explicit schema, metadata size hint).

Not in this PR

  • table_partition_cols (Hive-style partition columns) — requires per-column ArrowType marshalling.
  • file_sort_order — requires logical-Expr serialization.
  • file_decryption_properties — encryption; niche.
  • A static-factory builder() — fluent setters suffice.
  • CSV/JSON/Avro analog option classes.

@andygrove andygrove merged commit 233243c into apache:main May 13, 2026
1 check passed
@andygrove andygrove deleted the register-parquet-options branch May 13, 2026 05:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose ParquetReadOptions via registerParquet and readParquet

1 participant