Java bindings for Apache DataFusion. Queries run in native Rust and results return to the JVM as Apache Arrow batches via the Arrow C Data Interface.
Early development: no releases yet, API will change. Bug reports and contributions welcome.
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.apache.datafusion.DataFrame;
import org.apache.datafusion.SessionContext;
try (var allocator = new RootAllocator();
var ctx = new SessionContext()) {
ctx.registerParquet("orders", "/path/to/orders.parquet");
try (DataFrame df = ctx.sql(
"SELECT o_orderpriority, COUNT(*) AS n " +
"FROM orders GROUP BY o_orderpriority");
ArrowReader reader = df.collect(allocator)) {
while (reader.loadNextBatch()) {
var batch = reader.getVectorSchemaRoot();
// ...
}
}
}SessionContext and DataFrame are AutoCloseable and not thread-safe.
The full documentation lives under docs/source/
and is built with Sphinx (see docs/README.md for the
build steps):
- User guide — installation, the DataFrame and SQL APIs, Parquet ingestion.
- Contributor guide — build, test, code style, and how to bump the DataFusion version.
JDK 17+. Building from source: see
docs/source/contributor-guide/development.md.
Open an issue to discuss non-trivial changes before sending a PR. See the contributor guide.
Apache License 2.0. See LICENSE.txt and NOTICE.txt.