[fix](NestedColumnPruning) mixed array metadata-only access returns wrong nested cardinality#64535
[fix](NestedColumnPruning) mixed array metadata-only access returns wrong nested cardinality#64535englefly wants to merge 5 commits into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Consolidate duplicated nested column metadata-only access handling for string, array, and map columns. The refactor routes NULL/OFFSET-only access through one branch and centralizes covered metadata suffix cleanup in a helper that documents each strip case and its implementation path. OFFSET cleanup is split by coverage type so data-path coverage and deeper-OFFSET coverage have separate maintenance boundaries. It also adds a regression case for OFFSET plus NULL on the same prefix and removes an extra blank line in the related test file so FE checkstyle passes.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest
- cd fe && mvn checkstyle:check -pl fe-core
- Behavior changed: No
- Does this need documentation: No
refactory accessTree.hasStringOffsetOnlyAccess() and other type merge
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: length(map_col['key']) was normalized from [map_col, *, OFFSET] to [map_col, KEYS] and [map_col, VALUES, OFFSET], but the logic only handled root MAP columns. For nested maps such as length(element_at(s, 'm')['key']), FE kept [s, m, *, OFFSET], which BE cannot interpret as split key/value access. This change recursively detects map element lookups whose value side is offset-only and normalizes both root and nested map paths before the common metadata strip logic runs.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- cd fe && mvn checkstyle:check -pl fe-core
- ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest#testNestedMapElementLengthKeepsValueOffsetPath
- ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest
- Behavior changed: No
- Does this need documentation: No
|
run buildall |
|
/review |
There was a problem hiding this comment.
Requesting changes for one correctness issue in nested metadata access-path cleanup.
Critical checkpoint conclusions:
- Goal/test: The PR aims to normalize and strip redundant nested
NULL/OFFSETpaths. The current tests cover deeperOFFSETover shallowerOFFSET, same-prefixOFFSEToverNULL, and deeperNULLover shallowerNULL, but not the mixed shallowerOFFSETplus deeperNULLcase that still breaks BE reader mode selection. - Scope/focus: The change is localized to FE/Nereids nested-column pruning and related unit tests; the issue is in the new common strip pipeline.
- Concurrency/lifecycle/config/compatibility/transactions/writes/observability: Not applicable; this is planner metadata passed to scan descriptors.
- Parallel paths: The cleanup now handles data coverage and deeper-
OFFSETcoverage, but the equivalent deeper-NULLcoverage for a shallowerOFFSETpath is missing. - Testing:
git diff --checkpassed. I did not run FE unit tests becausethirdparty/installed/bin/protocis absent in this checkout, per FE build instructions. - User focus: No additional user-provided review focus was supplied.
TPC-H: Total hot run time: 28702 ms |
TPC-DS: Total hot run time: 169229 ms |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
|
/review |
There was a problem hiding this comment.
Requesting changes for one remaining correctness issue in nested map-value metadata cleanup.
Critical checkpoint conclusions:
- Goal/test: The PR aims to normalize and strip redundant nested NULL/OFFSET access paths so BE readers do not enter metadata-only modes while deeper child data is still needed. The root array cases are now addressed, but the map-value case below is still not covered by the implementation or tests.
- Scope/focus: The changes are localized to FE/Nereids access-path collection/cleanup and focused unit tests, but the affected helper is common to all nested metadata paths.
- Concurrency/lifecycle/config/compatibility/transactions/writes/observability: Not applicable; this is planner metadata passed to scan descriptors, with no new runtime shared state, config, persistence, or protocol fields.
- Parallel paths: Root array/struct same-prefix and deeper-prefix cleanup paths are handled, but the equivalent map value-side path after
*->VALUESnormalization is missing. - Testing: I did not run tests locally in this review. Existing PR context reports FE unit/checkstyle runs, but the failing map-value mixed-meta scenario is not represented.
- User focus: No additional user-provided review focus was supplied.
| } | ||
| List<String> otherPrefix = other.subList(0, other.size() - 1); | ||
| if (hasStrictPrefix(otherPrefix, targetPrefix)) { | ||
| toRemove.add(p); |
There was a problem hiding this comment.
This still leaves the same unsafe BE reader-mode combination after map value normalization, but in a different path shape from the earlier root-array thread.
Reduced plan:
Project(cardinality(element_at(m, 'a')), is_null(element_at(element_at(m, 'a'), 1)))
OlapScan(m: MAP<STRING, ARRAY<INT>>)
AccessPathExpressionCollector emits [m, *, OFFSET] for cardinality(element_at(m,'a')) and [m, *, *, NULL] for the element null check. normalizeMapValueMetaOnlyAccessPaths rewrites only the first path to [m, KEYS] + [m, VALUES, OFFSET]; the deeper null path remains [m, *, *, NULL]. At this line the comparison is a plain lexical prefix check, so [m, *, *] is not treated as a value-side child of [m, VALUES], and [m, VALUES, OFFSET] survives.
BE then splits [m, *, *, NULL] to the value array as [value, *, NULL] and also sends [value, OFFSET]. ArrayFileColumnIterator::set_access_paths sees the value-array OFFSET, switches that iterator to OFFSET_ONLY, and skips the item iterator, so the deeper NULL path is never applied. The IS NULL result can be evaluated from default-filled item/null columns.
Please make the deeper-meta coverage map-aware here as well, e.g. reuse the compareOffsetPrefixCoverage semantics for * vs VALUES while preserving the supplemental [m, KEYS] path, or normalize deeper map-star meta paths to VALUES before this comparison. A unit/regression case with cardinality(map_arr_col['a']) plus element_at(map_arr_col['a'], 1) is null would catch this.
TPC-H: Total hot run time: 29187 ms |
TPC-DS: Total hot run time: 174721 ms |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)