Skip to content

deps: Upgrade to DataFusion 54.0.0#4062

Merged
mbutrovich merged 52 commits into
mainfrom
datafusion-54
Jun 22, 2026
Merged

deps: Upgrade to DataFusion 54.0.0#4062
mbutrovich merged 52 commits into
mainfrom
datafusion-54

Conversation

@andygrove

@andygrove andygrove commented Apr 24, 2026

Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #3978.

Rationale for this change

DataFusion 54.0.0 is released to crates.io, so this upgrades Comet main from DataFusion 53.1.0 to 54.0.0. It bumps the deps and applies the DF 54 API and behavior adaptations that we developed and validated on the long-running datafusion-54 branch.

What changes are included in this PR?

  • Bump DataFusion deps from 53.1.0 to 54.0.0 in native/Cargo.toml and native/core/Cargo.toml (datafusion, datafusion-datasource, datafusion-physical-expr-adapter, datafusion-spark, datafusion-functions-nested); Cargo.lock regenerated.
  • DF 54 API adaptations (per the 54.0.0 upgrade guide):
    • Remove as_any() overrides and strip .as_any() from call sites, since the method was dropped from PhysicalExpr, ScalarUDFImpl, ExecutionPlan, etc. (now reachable via the Any supertrait).
    • schema_adapter.rs: CastColumnExpr was removed in favor of CastExpr, which no longer exposes input_field(). Derive the input field from the child Column plus the physical file schema, falling back to a synthesized field from the target name and child data type.
    • MemoryPool impls (fair_pool, logging_pool, unified_pool): add Display and name() to satisfy the new supertrait requirements.
  • DF 54 behavior adaptations:
    • pow: mark Pow unsupported and fall back to Spark, due to a DataFusion power correctness regression (datafusion#22598).
    • WeekDay: DataFusion isodow is 1..=7 (Monday=1) while Spark WeekDay is 0..=6 (Monday=0), so subtract 1 from datepart(isodow, ...) (datafusion#22599).
  • Enable Sort Merge Join with filter by default (spark.comet.exec.sortMergeJoinWithJoinFilter.enabled) and mark the config deprecated.
  • Add a SPARK-43113 reproducer in CometJoinSuite (full outer SMJ with NULL in join filter).

How are these changes tested?

Existing tests, plus the new SPARK-43113 reproducer.

@comphead comphead mentioned this pull request Apr 27, 2026
6 tasks
@comphead comphead marked this pull request as ready for review May 29, 2026 17:01
@mbutrovich mbutrovich marked this pull request as draft May 29, 2026 17:13
@mbutrovich mbutrovich marked this pull request as ready for review June 8, 2026 17:10
@mbutrovich mbutrovich changed the title feat: Upgrade to DataFusion 54 [do not merge] deps: Upgrade to DataFusion 54.0.0 Jun 8, 2026
@mbutrovich mbutrovich self-assigned this Jun 8, 2026
@mbutrovich mbutrovich added this to the 0.17.0 milestone Jun 8, 2026
mbutrovich and others added 6 commits June 11, 2026 15:23
# Conflicts:
#	native/Cargo.lock
#	spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
#	spark/src/main/scala/org/apache/comet/serde/math.scala
@comphead comphead modified the milestones: 0.17.0, 1.0.0 Jun 19, 2026
.map(|m| (m.name(), m.as_usize() as i64))
.for_each(|(name, value)| {
native_metric_node.metrics.insert(name.to_string(), value);
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we prob can simplify this to iterate once

withParquetTable(doubleValues.flatMap(m => doubleValues.map(n => (m, n))), "tbl") {
// expressions with two args
for (expr <- Seq("atan2", "pow")) {
for (expr <- Seq("atan2")) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pow has correctness issues, the test itself moved to pow.sql

@mbutrovich mbutrovich left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think we might hit a minor performance regression on TPC-DS, but we'll tackle those here and in DF 54.1.

I tracked that regression in #3978. I can open a new issue to track it after we merge 54.0.

@mbutrovich mbutrovich merged commit e72ebd3 into main Jun 22, 2026
69 checks passed
marvelshan pushed a commit to marvelshan/datafusion-comet that referenced this pull request Jul 2, 2026
Co-authored-by: Matt Butrovich <mbutrovich@gmail.com>
Co-authored-by: Matt Butrovich <mbutrovich@users.noreply.github.com>
Co-authored-by: comphead <comphead@ukr.net>
Co-authored-by: Oleks V <comphead@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chore: DataFusion 54.0.0

3 participants