feat(dataframe): expose withColumn and unnestColumns#54
Merged
andygrove merged 1 commit intoMay 17, 2026
Conversation
Closes apache#41 withColumn(name, expr) takes a SQL fragment, parses it via df.parse_sql_expr (the same convention filter(String) uses), and calls DataFusion's with_column. Replaces a column of the same name in place; otherwise appends. unnestColumns(String...) and unnestColumns(UnnestOptions, String...) route to unnest_columns_with_options. UnnestOptions exposes the preserveNulls knob (default true, matching upstream). Recursions are deferred to a follow-up since they need a richer column-pair representation (RecursionUnnestOption with input/output/depth). Tests: 11 new transformation tests, 2 UnnestOptions setter tests, plus the two close/collect tests extended for the new methods. The unnest tests cover both preserveNulls=true (keep null rows) and preserveNulls=false (drop null rows). withColumn tests cover both the append-new and replace-existing branches. make test (90 tests, 0 failures) and cargo clippy/fmt are clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
DataFramegaineddropColumnsandwithColumnRenamedin #30 but the most common column-shaping primitive -- adding or replacing a column from a SQL expression -- was still missing.unnestColumnsis in the same family for struct / list flattening. The issue (#41) lists both as a single unit of work.withColumnandunnestColumnsfollow the same JNI shape asfilter(String)from #19: the SQL fragment is parsed on the native side viadf.parse_sql_expr, so no Java-sideExprmodel is required. That keeps this PR independent of any future joins /Expr-builder work.What changes are included in this PR?
DataFrame.withColumn(String name, String expr)— replaces a column of the same name in place, otherwise appends. MirrorsDataFusion::DataFrame::with_column. The expression is parsed against this DataFrame's own schema using the sameparse_sql_exprconvention asfilter(String). The receiver remains usable.DataFrame.unnestColumns(String... columns)— defaults to upstream'sUnnestOptions::new()(i.e.preserve_nulls = true).DataFrame.unnestColumns(UnnestOptions options, String... columns)— explicit options overload. Routes tounnest_columns_with_optionsupstream.UnnestOptionsJava class with a singlepreserveNulls(boolean)knob (defaulttrue, matching upstream).native/src/lib.rs:withColumnExprandunnestColumns. Both follow the existing patterns (filterRows/dropColumns) — no new imports beyonddatafusion::common::UnnestOptions.Out of scope (filed separately):
UnnestOptions.recursions— needs a richerRecursionUnnestOptionJava class withinput_column,output_column, anddepth, plus a parallel-array JNI layout. Not in feat(dataframe): expose withColumn and unnestColumns #41's checklist; can land independently.Are these changes tested?
Yes, 13 new tests, plus two existing close/collect tests extended.
Are there any user-facing changes?
Yes, purely additive. New public API:
org.apache.datafusion.UnnestOptionsDataFrame.withColumn(String, String)DataFrame.unnestColumns(String...)DataFrame.unnestColumns(UnnestOptions, String...)No API removals, no deprecations, no behavior change for existing callers.