Skip to content

feat(dataframe): expose withColumn and unnestColumns#54

Merged
andygrove merged 1 commit into
apache:mainfrom
LantaoJin:feat/dataframe-with-column-unnest
May 17, 2026
Merged

feat(dataframe): expose withColumn and unnestColumns#54
andygrove merged 1 commit into
apache:mainfrom
LantaoJin:feat/dataframe-with-column-unnest

Conversation

@LantaoJin
Copy link
Copy Markdown
Contributor

@LantaoJin LantaoJin commented May 15, 2026

Which issue does this PR close?

Rationale for this change

DataFrame gained dropColumns and withColumnRenamed in #30 but the most common column-shaping primitive -- adding or replacing a column from a SQL expression -- was still missing. unnestColumns is in the same family for struct / list flattening. The issue (#41) lists both as a single unit of work.

withColumn and unnestColumns follow the same JNI shape as filter(String) from #19: the SQL fragment is parsed on the native side via df.parse_sql_expr, so no Java-side Expr model is required. That keeps this PR independent of any future joins / Expr-builder work.

What changes are included in this PR?

  • DataFrame.withColumn(String name, String expr) — replaces a column of the same name in place, otherwise appends. Mirrors DataFusion::DataFrame::with_column. The expression is parsed against this DataFrame's own schema using the same parse_sql_expr convention as filter(String). The receiver remains usable.
  • DataFrame.unnestColumns(String... columns) — defaults to upstream's UnnestOptions::new() (i.e. preserve_nulls = true).
  • DataFrame.unnestColumns(UnnestOptions options, String... columns) — explicit options overload. Routes to unnest_columns_with_options upstream.
  • UnnestOptions Java class with a single preserveNulls(boolean) knob (default true, matching upstream).
  • JNI handlers in native/src/lib.rs: withColumnExpr and unnestColumns. Both follow the existing patterns (filterRows / dropColumns) — no new imports beyond datafusion::common::UnnestOptions.

Out of scope (filed separately):

Are these changes tested?

Yes, 13 new tests, plus two existing close/collect tests extended.

Are there any user-facing changes?

Yes, purely additive. New public API:

  • org.apache.datafusion.UnnestOptions
  • DataFrame.withColumn(String, String)
  • DataFrame.unnestColumns(String...)
  • DataFrame.unnestColumns(UnnestOptions, String...)

No API removals, no deprecations, no behavior change for existing callers.

Closes apache#41

withColumn(name, expr) takes a SQL fragment, parses it via
df.parse_sql_expr (the same convention filter(String) uses), and
calls DataFusion's with_column. Replaces a column of the same name
in place; otherwise appends.

unnestColumns(String...) and unnestColumns(UnnestOptions, String...)
route to unnest_columns_with_options. UnnestOptions exposes the
preserveNulls knob (default true, matching upstream). Recursions
are deferred to a follow-up since they need a richer column-pair
representation (RecursionUnnestOption with input/output/depth).

Tests: 11 new transformation tests, 2 UnnestOptions setter tests,
plus the two close/collect tests extended for the new methods. The
unnest tests cover both preserveNulls=true (keep null rows) and
preserveNulls=false (drop null rows). withColumn tests cover both
the append-new and replace-existing branches. make test (90 tests,
0 failures) and cargo clippy/fmt are clean.
Copy link
Copy Markdown
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LantaoJin

@andygrove andygrove merged commit 678eee1 into apache:main May 17, 2026
1 check passed
@LantaoJin LantaoJin deleted the feat/dataframe-with-column-unnest branch May 18, 2026 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(dataframe): expose withColumn and unnestColumns

2 participants