Skip to content

Add metrics to FFI_ExecutionPlan#22136

Merged
timsaucer merged 10 commits into
apache:mainfrom
mailmindlin:feature/ffi-executionplan-metrics
May 14, 2026
Merged

Add metrics to FFI_ExecutionPlan#22136
timsaucer merged 10 commits into
apache:mainfrom
mailmindlin:feature/ffi-executionplan-metrics

Conversation

@mailmindlin
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

FFI_ExecutionPlan exposes most of the ExecutionPlan trait but does not
expose metrics(). As a result, ForeignExecutionPlan::metrics() falls
through to the trait default (None), so anything downstream of an FFI
boundary loses metrics. The most visible breakage is EXPLAIN ANALYZE,
which renders empty metric blocks for foreign plans; anything calling
DisplayableExecutionPlan::with_metrics(...) on a plan tree containing
foreign nodes is similarly affected.

This PR makes foreign plans behave the same as local plans for metric
reporting. Metrics are passed as a snapshot, and all atomic-backed
counters/gauges/timers are read into plain integer fields at marshal time.
Correct because none of the in-tree consumers (AnalyzeExec, DisplayableExecutionPlan) poll metrics
during streaming.

What changes are included in this PR?

  • New module datafusion/ffi/src/metrics.rs with FFI-stable mirrors of
    MetricsSet, Metric, MetricValue (all 16 variants), Label,
    MetricType, MetricCategory, PruningMetrics, RatioMetrics, and
    RatioMergeStrategy, plus bidirectional From conversions.
  • MetricValue::Custom { value: Arc<dyn CustomMetricValue> } is marshalled
    as (name, Display output, as_usize()). On the consumer side it is
    reconstructed as a small FfiCustomMetricValue shim that preserves
    Display and as_usize(). aggregate becomes a no-op (snapshots are
    not mergeable) and as_any only downcasts to the shim — this is the
    documented compromise.
  • FFI_ExecutionPlan gains a new metrics function pointer (appended
    after repartitioned). ForeignExecutionPlan::metrics() is implemented
    to call through it.
  • Two trivial accessors added to RatioMetrics: merge_strategy() and
    display_raw_values() — needed to marshal these otherwise-private fields.
  • chrono added as a direct dependency of datafusion-ffi (used for
    Timestamp ↔ unix-nanos conversion).

Are these changes tested?

Yes. New tests, all passing:

  • 7 unit tests in datafusion/ffi/src/metrics.rs round-trip every
    MetricValue variant individually, plus a full Metric
    (value + labels + partition + type + category) and a MetricsSet.
  • test_ffi_execution_plan_metrics_round_trip in
    datafusion/ffi/src/execution_plan.rs exercises the full FFI path:
    builds an ExecutionPlan with a MetricsSet, wraps it in
    FFI_ExecutionPlan, retrieves metrics via ForeignExecutionPlan::metrics()
    through mock_foreign_marker_id, and asserts the aggregated value matches.
  • EmptyExec test helper extended with with_metrics(MetricsSet).

Existing test suites still pass: cargo test -p datafusion-ffi --all-features and cargo test -p datafusion-ffi --features integration-tests.

Are there any user-facing changes?

Yes — this PR adds public API and makes a binary-incompatible change to
FFI_ExecutionPlan. Please add the api change label.

  • New public types in datafusion_ffi::metrics: FFI_MetricsSet,
    FFI_Metric, FFI_MetricValue, FFI_Label, FFI_MetricType,
    FFI_MetricCategory, FFI_PruningMetrics, FFI_RatioMetrics,
    FFI_RatioMergeStrategy, and FfiCustomMetricValue.
  • ABI break for FFI_ExecutionPlan: a new metrics function pointer
    field is appended. Producers and consumers must be rebuilt together, as
    is already enforced by the major-version check via
    datafusion_ffi::version().
  • New public accessors on RatioMetrics: merge_strategy() and
    display_raw_values(). Non-breaking additions.
  • MetricValue::Custom across FFI is lossy by design: the underlying
    dyn CustomMetricValue is not preserved; only its Display output and
    as_usize() snapshot survive. Documented on FfiCustomMetricValue.

@github-actions github-actions Bot added physical-expr Changes to the physical-expr crates ffi Changes to the ffi crate labels May 12, 2026
@timsaucer timsaucer self-requested a review May 13, 2026 12:11
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-ffi v53.1.0 (current)
       Built [  57.069s] (current)
     Parsing datafusion-ffi v53.1.0 (current)
      Parsed [   0.063s] (current)
    Building datafusion-ffi v53.1.0 (baseline)
       Built [  56.899s] (baseline)
     Parsing datafusion-ffi v53.1.0 (baseline)
      Parsed [   0.059s] (baseline)
    Checking datafusion-ffi v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.320s] 222 checks: 220 pass, 1 fail, 1 warn, 30 skip

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field FFI_ExecutionPlan.metrics in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/execution_plan.rs:75

--- warning repr_c_plain_struct_fields_reordered: struct fields reordered in repr(C) struct ---

Description:
A public repr(C) struct had its fields reordered. This can change the struct's memory layout, possibly breaking FFI use cases that depend on field position and order.
        ref: https://doc.rust-lang.org/reference/type-layout.html#reprc-structs
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/repr_c_plain_struct_fields_reordered.ron

Failed in:
  FFI_ExecutionPlan.clone moved from position 7 to 8, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/execution_plan.rs:79
  FFI_ExecutionPlan.release moved from position 8 to 9, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/execution_plan.rs:82
  FFI_ExecutionPlan.private_data moved from position 9 to 10, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/execution_plan.rs:86
  FFI_ExecutionPlan.library_marker_id moved from position 10 to 11, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/execution_plan.rs:91

     Summary semver requires new major version: 1 major and 0 minor checks failed
     Warning produced 1 major and 0 minor level warnings
    Finished [ 116.386s] datafusion-ffi
    Building datafusion-physical-expr-common v53.1.0 (current)
       Built [  19.399s] (current)
     Parsing datafusion-physical-expr-common v53.1.0 (current)
      Parsed [   0.020s] (current)
    Building datafusion-physical-expr-common v53.1.0 (baseline)
       Built [  19.783s] (baseline)
     Parsing datafusion-physical-expr-common v53.1.0 (baseline)
      Parsed [   0.022s] (baseline)
    Checking datafusion-physical-expr-common v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.258s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  40.483s] datafusion-physical-expr-common

@github-actions github-actions Bot added the auto detected api change Auto detected API change label May 13, 2026
@timsaucer timsaucer requested a review from Copilot May 13, 2026 12:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends datafusion-ffi so ExecutionPlan::metrics() can be accessed across the FFI boundary, preserving execution metrics for foreign plans (e.g., for EXPLAIN ANALYZE and DisplayableExecutionPlan::with_metrics(...)) by marshalling metrics as a snapshot.

Changes:

  • Add FFI-stable mirrors of MetricsSet, Metric, and all MetricValue variants (plus related metric types) with bidirectional conversions.
  • Add a metrics function pointer to FFI_ExecutionPlan and implement ForeignExecutionPlan::metrics() via that callback.
  • Expose RatioMetrics internals needed for marshalling via new accessors; add chrono as a direct datafusion-ffi dependency for timestamp conversion.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
datafusion/physical-expr-common/src/metrics/value.rs Adds RatioMetrics accessors needed to snapshot/serialize merge strategy and display options.
datafusion/ffi/src/metrics.rs Introduces FFI-stable metric types and snapshot conversions, plus round-trip tests.
datafusion/ffi/src/lib.rs Exposes the new metrics module publicly.
datafusion/ffi/src/execution_plan.rs Adds FFI_ExecutionPlan.metrics callback and wires it into ForeignExecutionPlan::metrics(); adds an integration-style test.
datafusion/ffi/Cargo.toml Adds direct chrono dependency for timestamp marshalling.
Cargo.lock Records the new chrono dependency for datafusion-ffi.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread datafusion/ffi/src/metrics.rs Outdated
Comment thread datafusion/ffi/src/metrics.rs Outdated
Comment thread datafusion/ffi/src/metrics.rs Outdated
Comment on lines +73 to +76
/// Snapshot the plan's execution metrics. Returns `None` when the
/// underlying [`ExecutionPlan::metrics`] returned `None`.
pub metrics: unsafe extern "C" fn(plan: &Self) -> FFI_Option<FFI_MetricsSet>,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is expected.

@timsaucer timsaucer added api change Changes the API exposed to users of the crate labels May 13, 2026
Comment thread datafusion/ffi/src/physical_expr/metrics.rs
Comment thread datafusion/ffi/src/metrics.rs Outdated
Comment thread datafusion/ffi/src/metrics.rs Outdated
Comment thread datafusion/ffi/src/metrics.rs Outdated
Comment on lines +383 to +389
fn timestamp_from_ffi(nanos: FFI_Option<i64>) -> Timestamp {
let ts = Timestamp::new();
if let Some(n) = nanos.into_option() {
ts.set(DateTime::<Utc>::from_timestamp_nanos(n));
}
ts
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we lose something if we round trip and the original is not UTC?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That shouldn't be possible. Timestamp is composed of Arc<Mutex<Option<DateTime<Utc>>>>, and is documented as only storing UTC-nanoseconds.

Copy link
Copy Markdown
Member

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution!

@timsaucer timsaucer added this pull request to the merge queue May 14, 2026
Merged via the queue into apache:main with commit 1af9bd7 May 14, 2026
39 checks passed
@mailmindlin mailmindlin deleted the feature/ffi-executionplan-metrics branch May 14, 2026 13:43
rerun-sync Bot pushed a commit to rerun-io/rerun that referenced this pull request May 15, 2026
### Related
* Probably closes RR-4596
* Part of RR-4594

### What

Surfaces the per-query DataFusion metrics we already record on every
dataset query — plan-time decisions (`filters_pushed_down`,
`filters_applied_client_side`, `entity_path_narrowing_applied`,
`query_chunks`, `query_segments`, etc.) and execution-time fetch
counters (`fetch_grpc_bytes`, `fetch_direct_bytes`,
`fetch_direct_retries_total`, etc.) — to Python users via a
`rerun.experimental.query_metrics()` context manager.

These are the same numbers that already should surface in Rust-side
`EXPLAIN ANALYZE` and PostHog analytics spans, not new instrumentation.

#### Why not just `df.explain(analyze=True)`?
Datafusion's `FFI_ExecutionPlan` is missing methods from `trait
ExecutionPlan`, including the metrics hook. I've opened a PR
apache/datafusion#22136 to fix this, but it's a breaking change and will
require a major version bump and this is a prag******** short-term
solution. We can decide if we want to remove `query_metrics()` if/when
the Datafusion PR lands (which is why it lives in experimental).

#### Example

```python
import tempfile
from pathlib import Path

import numpy as np
from datafusion import col, lit

import rerun as rr
from rerun.experimental import QueryMetrics, query_metrics

def throughput_mb_s(q: QueryMetrics) -> float:
    """Effective fetch throughput (plan + fetch + decode), MB/s."""
    fetched = q.fetch_grpc_bytes + q.fetch_direct_bytes
    if q.total_duration_us <= 0 or fetched <= 0:
        return 0.0
    return (fetched / 1e6) / (q.total_duration_us / 1e6)

rng = np.random.default_rng(0)
with tempfile.TemporaryDirectory() as tmp:
    # 3 segments × 50 rows, one Points3D entity.
    paths = []
    for i in range(3):
        rec = rr.RecordingStream("demo", recording_id=f"seg{i}")
        for j in range(50):
            rec.set_time("t", sequence=j)
            rec.log("/points", rr.Points3D(rng.random((10, 3))))
        p = Path(tmp) / f"seg{i}.rrd"
        rec.save(str(p))
        paths.append(p)

    with rr.server.Server(datasets={"demo": paths}) as server:
        ds = server.client().get_dataset(name="demo")
        seg0 = ds.segment_ids()[0]

        with query_metrics() as m:
            # Q1: scan everything
            ds.reader(index="t").limit(100).collect()
            # Q2: filter on rerun_segment_id pushes into query_dataset
            ds.reader(index="t").filter(col("rerun_segment_id") == lit(seg0)).limit(100).collect()

        for label, q in zip(["all segments", "one segment"], m.queries):
            fetched = q.fetch_grpc_bytes + q.fetch_direct_bytes
            print(
                f"{label:>13}: chunks={q.query_chunks:>3}  "
                f"bytes={fetched:>6}  pushdown={q.filters_pushed_down}  "
                f"{q.total_duration_us / 1000:>5.1f} ms  "
                f"{throughput_mb_s(q):>5.2f} MB/s"
            )
```
Output:
```
all segments: chunks=  3  bytes= 32096  pushdown=0    0.6 ms  51.85 MB/s
 one segment: chunks=  1  bytes= 11208  pushdown=1    0.6 ms  20.12 MB/s
```

### Testing
* All tests pass
* Added e2e test to check the context manager
* Changed some existing tests to validate pushdown filtering (before
they couldn't assert that before)

### Compatibility

Full compatibility (client-side changes only). I put the context manager
in `rerun.experimental` so we can choose to remove it if the Datafusion
PR lands soon.

---------

Source-Ref: e2b958694d7bad386083108a03fcce6794727f77
Co-authored-by: Emil Ernerfeldt <emil.ernerfeldt@gmail.com>
JSOD11 pushed a commit to JSOD11/datafusion that referenced this pull request May 19, 2026
## Which issue does this PR close?

- Closes apache#22152 

## Rationale for this change

`ExecutionPlan::partition_statistics` and `TableProvider::statistics`
are not currently transported across the DataFusion FFI boundary, so
foreign plans and providers always report `Statistics::new_unknown` /
`None`. This blocks optimizer rules that depend on statistics (e.g. join
reordering, partition pruning) from working with out-of-process plugins,
which defeats the point of exposing those hooks to plugin authors.

`Statistics` contains `Precision<ScalarValue>` for column min/max/sum.
`ScalarValue` is a large enum that's impractical to mirror in
`#[repr(C)]`, so I reuse the existing
`datafusion_proto_common::Statistics` prost encoding — the same pattern
this crate already uses for filter expressions.

## What changes are included in this PR?

- New `datafusion_ffi::statistics` module with
`[de]serialize_statistics` helpers wrapping
the`datafusion_proto_common::Statistics` round-trip.
- New `partition_statistics` field on `FFI_ExecutionPlan` and
corresponding `ExecutionPlan::partition_statistics` impl on
`ForeignExecutionPlan`
- New `statistics` field on `FFI_TableProvider` and corresponding
`TableProvider::statistics` impl on `ForeignTableProvider`. Since the
trait returns `Option<Statistics>`, the implementation cannot propagate
decode errors, it logs a `log::warn!` and triggers a `debug_assert!`.

This PR is expected to be merged after apache#22136 so it includes those
changes.

## Are these changes tested?

Yes:

- Unit tests in `statistics.rs` cover three round-trip cases:
`Statistics::new_unknown`, fully-exact statistics with
`ScalarValue::Int32`/`Int64`/`Utf8` min/max/sum, and mixed
`Precision::Exact`/`Inexact`/`Absent` values.
- A new round-trip integration test in `execution_plan.rs` exercises
`ForeignExecutionPlan::partition_statistics` with both `None` and
`Some(idx)` partitions, against a plan with no statistics (returns
`Statistics::new_unknown`) and a plan with concrete statistics.
- A new round-trip integration test in `table_provider.rs` uses a thin
`TableWithStats` wrapper over `MemTable` to verify both the `None` path
and the concrete `Statistics` path through
`ForeignTableProvider::statistics`.

## Are there any user-facing changes?

This is a breaking ABI change for the `datafusion-ffi` crate:

- `FFI_ExecutionPlan` gains a `partition_statistics` field.
- `FFI_TableProvider` gains a `statistics` field.

Plugins compiled against earlier versions of `datafusion-ffi` will need
to be recompiled. There are no breaking changes to the Rust trait
surface or to `Statistics` itself; downstream `ExecutionPlan` /
`TableProvider` implementations require no changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate auto detected api change Auto detected API change ffi Changes to the ffi crate physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose ExecutionPlan::metrics() across the FFI boundary

3 participants