[core] Avoid cross-file blob and vector compaction for data evolution by leaves12138 · Pull Request #7938 · apache/paimon

leaves12138 · 2026-05-22T17:28:58Z

Purpose

This PR prevents standalone Data Evolution dedicated-file compaction from combining blob or vector-store files that belong to different regular data-file row-id ranges.

Root Cause

The compact planner grouped dedicated files from a data compaction group before planning dedicated compact tasks. If blob or vector-store files were compacted across multiple regular data-file ranges without compacting those regular data files into the same row-id range, the compacted dedicated file could overlap several remaining data files.

Conflict detection groups files by overlapping row-id range and filters blob files from the error message, so the failure surfaced as multiple regular data files with different row-id ranges conflicting during COMPACT.

Changes

Keep cross-data-file blob/vector-store compaction only when the corresponding regular data files are compacted in the same task.
Plan blob/vector-store compaction per containing data file when no regular data-file compaction is triggered.
Update planner tests for both the no-compact and compact-together paths.

Tests

JAVA_HOME=/opt/zulu8.68.0.21-ca-jdk8.0.362-macosx_aarch64 mvn -pl paimon-core spotless:apply
JAVA_HOME=/opt/zulu8.68.0.21-ca-jdk8.0.362-macosx_aarch64 mvn -pl paimon-core -Dtest=DataEvolutionCompactCoordinatorTest test

JingsongLi

Left comments:

Vector store files have the same bug. Lines 374-383 still collect all vector store files from all data files in the group and compact them together, regardless of whether triggerNormalFile is true. The same
cross-file compaction problem applies to vector store files. The fix should be applied symmetrically.
Test is a negative-only assertion. The new test testCompactPlannerDoesNotCompactBlobFilesAcrossDataFiles asserts tasks.isEmpty(), but it would be stronger to also verify that when compactMinFileNum=2 (matching
the 2 data files), the blob files DO get compacted together. This proves both the "yes-compact" and "no-compact" paths work. The existing testCompactPlannerWithBlobFiles partially covers this, but the boundary
is subtle.
Edge case: single data file with multiple blob files per field. When triggerNormalFile == false, the per-data-file blob compaction loop calls blobFileGroupsToCompact() for each data file individually. If a
single data file has, say, 3 small blob files for the same field (from prior partial compactions or writes), this correctly compacts them. Good.
Minor: The else branch iterates all dataFiles and plans blob compaction per file. If dataFiles has, say, 5 files but only 2 have blob files, this incurs 5 iterations but getOrDefault(..., emptyList()) returns
empty for the others and blobFileGroupsToCompact([]) returns empty — harmless but slightly wasteful. Not worth fixing.

[core] Avoid cross-file blob compaction for data evolution

eb72fac

leaves12138 marked this pull request as ready for review May 22, 2026 18:00

[core] Avoid cross-file blob compaction for data evolution

6d20e27

leaves12138 force-pushed the codex/fix-de-blob-compact-range branch from b0abd46 to 6d20e27 Compare May 22, 2026 18:09

JingsongLi reviewed May 23, 2026

View reviewed changes

[core] Address dedicated compact planner review comments

b597fac

leaves12138 changed the title ~~[core] Avoid cross-file blob compaction for data evolution~~ [core] Avoid cross-file blob and vector compaction for data evolution May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Avoid cross-file blob and vector compaction for data evolution#7938

[core] Avoid cross-file blob and vector compaction for data evolution#7938
leaves12138 wants to merge 3 commits into
apache:masterfrom
leaves12138:codex/fix-de-blob-compact-range

leaves12138 commented May 22, 2026 •

edited

Loading

Uh oh!

JingsongLi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leaves12138 commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Root Cause

Changes

Tests

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leaves12138 commented May 22, 2026 •

edited

Loading