fix(udf): pass batch row count to ScalarFunction.evaluate by LantaoJin · Pull Request #57 · apache/datafusion-java

LantaoJin · 2026-05-18T07:02:23Z

Which issue does this PR close?

Closes bug(udf): nullary scalar UDFs cannot determine batch row count #56 .

Rationale for this change

ScalarFunction.evaluate(BufferAllocator, List<FieldVector>) (introduced in #46) is the contract every Java-implemented scalar UDF must satisfy. It must return a FieldVector whose getValueCount() matches the batch row count DataFusion is driving through the operator tree.

For UDFs with at least one argument, the body can read args.get(0).getValueCount() to learn how many rows to produce. For nullary UDFs -- zero arguments, e.g. analogs of random(), pi(), now() -- args is the empty list, and the body has no other channel to learn the row count.

The native side already knows the value: ScalarFunctionArgs::number_rows is read at native/src/udf.rs:100, used to materialise scalar arg columns at :106. The Java bridge (JniBridge.invokeScalarUdf) receives it but only uses it after the fact, to validate the returned vector's length. It is never communicated to impl.evaluate(...).

The result: any nullary UDF that DataFusion does not constant-fold (anything declared Volatility.VOLATILE, or STABLE calls in plans the optimizer cannot fold) trips the post-hoc row-count validation as soon as it runs over a batch with more than one row.

What changes are included in this PR?

ScalarFunction.evaluate(BufferAllocator allocator, List<FieldVector> args, int rowCount) — adds a third parameter carrying the per-batch row count. Source-breaking signature change to a public interface. The repo is pre-release; only five existing implementations needed an unused-parameter update (four test UDFs in ScalarUdfTest, one in examples/AddOneExample).
JniBridge.invokeScalarUdf (core/src/main/java/org/apache/datafusion/internal/JniBridge.java) now forwards the existing expectedRowCount parameter into impl.evaluate(...). Post-call validation against the same value is unchanged.
No native-side change. The value was already on the wire.

Are these changes tested?

yes

Are there any user-facing changes?

Yes, a source-breaking signature change to ScalarFunction.evaluate. Implementations of the interface need to add an int rowCount parameter to their evaluate override. Bodies that ignore it remain identical otherwise.

Before:

public FieldVector evaluate(BufferAllocator allocator, List<FieldVector> args) {
  // ...
}

After:

public FieldVector evaluate(BufferAllocator allocator, List<FieldVector> args, int rowCount) {
  // ...
}

Add an int rowCount parameter to ScalarFunction.evaluate. JniBridge already receives the value from the native side as expectedRowCount for post-call validation; now it is also forwarded into evaluate. For UDFs with at least one argument the value matches what the body could read from args.get(0).getValueCount(). For nullary UDFs (args is empty), this is the only channel that communicates the batch row count, making it possible to implement Volatility.VOLATILE nullary functions like random() / now().

LantaoJin · 2026-05-19T02:24:24Z

Superseded by upstream #64; nullary row count is now available via ScalarFunctionArgs.rowCount().

LantaoJin closed this May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(udf): pass batch row count to ScalarFunction.evaluate#57

fix(udf): pass batch row count to ScalarFunction.evaluate#57
LantaoJin wants to merge 1 commit into
apache:mainfrom
LantaoJin:fix/udf-nullary-row-count

LantaoJin commented May 18, 2026

Uh oh!

LantaoJin commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LantaoJin commented May 18, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

LantaoJin commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant