Describe the bug
ScalarFunction.argTypes() returns List<ArrowType> and returnType() returns ArrowType (core/src/main/java/org/apache/datafusion/ScalarFunction.java:47, :50). Java Arrow's ArrowType is a leaf marker for the type kind: for primitives like Int32 or Float64 it is self-describing, but for nested types (List, Struct, Map, FixedSizeList) the element / member / key / value types live on the parent Field's children list, not inside ArrowType itself. ArrowType.List is literally a no-field marker class.
That mismatch means a Java UDF author has no way to declare a typed nested signature. The closest they can write is:
public List<ArrowType> argTypes() {
return List.of(new ArrowType.List()); // says "list" -- cannot say "of Int32"
}
When this is passed to SessionContext.registerUdf(ScalarUdf) the registration path at core/src/main/java/org/apache/datafusion/SessionContext.java:385-389 constructs the signature schema as:
fields.add(new Field("return", FieldType.nullable(returnType), null));
for (int i = 0; i < argTypes.size(); i++) {
fields.add(new Field("arg" + i, FieldType.nullable(argTypes.get(i)), null));
}
The null children list is the bug: Arrow's IPC writer rejects the malformed List field during serializeSchemaIpc(...) before the schema ever crosses JNI. The user sees a low-level IllegalArgumentException: Lists have one child Field. Found: none.
This blocks the entire family of nested-type UDFs that exist as built-ins in DataFusion's datafusion-functions-nested crate (array_length, cardinality, array_has, array_position, flatten, map_keys, map_values, arrays_zip, ...). Anyone porting Spark UDFs over ArrayType / StructType / MapType columns to DataFusion-Java hits this on the first attempt.
The Rust API does not have this problem: DataType::List(Arc<Field>) carries the child field inline, so Signature::exact(vec![DataType::List(Arc::new(Field::new("item", DataType::Int32, true)))], ...) round-trips with full structure.
To Reproduce
static final class ListLength implements ScalarFunction {
public String name() { return "java_list_length"; }
public List<ArrowType> argTypes() { return List.of(new ArrowType.List()); }
public ArrowType returnType() { return new ArrowType.Int(32, true); }
public Volatility volatility() { return Volatility.IMMUTABLE; }
public FieldVector evaluate(BufferAllocator allocator, List<FieldVector> args, int rowCount) {
/* ... */
}
}
new SessionContext().registerUdf(new ScalarUdf(new ListLength()));
// throws:
// IllegalArgumentException: Lists have one child Field. Found: none
// at SessionContext.serializeSchemaIpc(SessionContext.java:398)
// at SessionContext.registerUdf(SessionContext.java:391)
Expected behavior
A UDF whose argument or return type is a nested Arrow type registers successfully and is callable from SQL with full element-type information preserved end-to-end (Java → JNI → Rust Signature::exact).
Additional context
No response
Describe the bug
ScalarFunction.argTypes()returnsList<ArrowType>andreturnType()returnsArrowType(core/src/main/java/org/apache/datafusion/ScalarFunction.java:47, :50). Java Arrow'sArrowTypeis a leaf marker for the type kind: for primitives likeInt32orFloat64it is self-describing, but for nested types (List,Struct,Map,FixedSizeList) the element / member / key / value types live on the parentField'schildrenlist, not insideArrowTypeitself.ArrowType.Listis literally a no-field marker class.That mismatch means a Java UDF author has no way to declare a typed nested signature. The closest they can write is:
When this is passed to
SessionContext.registerUdf(ScalarUdf)the registration path atcore/src/main/java/org/apache/datafusion/SessionContext.java:385-389constructs the signature schema as:The
nullchildren list is the bug: Arrow's IPC writer rejects the malformedListfield duringserializeSchemaIpc(...)before the schema ever crosses JNI. The user sees a low-levelIllegalArgumentException: Lists have one child Field. Found: none.This blocks the entire family of nested-type UDFs that exist as built-ins in DataFusion's
datafusion-functions-nestedcrate (array_length,cardinality,array_has,array_position,flatten,map_keys,map_values,arrays_zip, ...). Anyone porting Spark UDFs overArrayType/StructType/MapTypecolumns to DataFusion-Java hits this on the first attempt.The Rust API does not have this problem:
DataType::List(Arc<Field>)carries the child field inline, soSignature::exact(vec![DataType::List(Arc::new(Field::new("item", DataType::Int32, true)))], ...)round-trips with full structure.To Reproduce
Expected behavior
A UDF whose argument or return type is a nested Arrow type registers successfully and is callable from SQL with full element-type information preserved end-to-end (Java → JNI → Rust
Signature::exact).Additional context
No response