Skip to content

feat(dataframe): add limit, distinct, dropColumns, withColumnRenamed#30

Merged
andygrove merged 5 commits into
apache:mainfrom
andygrove:feat/dataframe-methods-batch
May 13, 2026
Merged

feat(dataframe): add limit, distinct, dropColumns, withColumnRenamed#30
andygrove merged 5 commits into
apache:mainfrom
andygrove:feat/dataframe-methods-batch

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

  • Closes #.

Rationale for this change

The Java DataFrame API currently exposes only select and filter as
transformations. This PR rounds out the trivially-implementable subset of
DataFusion's Rust DataFrame API so users can build small pipelines without
falling back to SQL strings.

What changes are included in this PR?

Four new methods on DataFrame, each backed by a thin JNI bridge that calls
the corresponding DataFusion operator:

  • limit(int fetch) / limit(int skip, int fetch) — take the first N rows, optionally after skipping. Negative arguments throw IllegalArgumentException.
  • distinct() — deduplicate rows across all columns.
  • dropColumns(String... columnNames) — inverse of select. Unknown column names are silently ignored (matches DataFusion's drop_columns semantics).
  • withColumnRenamed(String oldName, String newName) — rename a column. Unknown old names are a no-op (matches DataFusion's with_column_renamed semantics).

All four follow the existing pattern: throw IllegalStateException on closed/collected handles, and leave the receiver usable so chaining off a shared source is non-destructive.

Are these changes tested?

Yes. DataFrameTransformationsTest gains 12 new tests covering happy paths,
non-destructive semantics, invalid arguments, and the silent-ignore behaviors
for dropColumns/withColumnRenamed. The existing methodsThrowAfterClose
and methodsThrowAfterCollect lifecycle tests are extended to cover the new
methods.

Are there any user-facing changes?

Yes — four new public methods on org.apache.datafusion.DataFrame. No
breaking changes.

@andygrove andygrove force-pushed the feat/dataframe-methods-batch branch from be7a83f to 0b8592e Compare May 13, 2026 20:12
@andygrove andygrove merged commit fe54208 into apache:main May 13, 2026
1 check passed
@andygrove andygrove deleted the feat/dataframe-methods-batch branch May 13, 2026 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant