Skip to content

branch-4.0: [fix](delta writer) Fix shared delta writer state lifetime#64512

Open
bobhan1 wants to merge 1 commit into
apache:branch-4.0from
bobhan1:branch-4.0-pick-64349
Open

branch-4.0: [fix](delta writer) Fix shared delta writer state lifetime#64512
bobhan1 wants to merge 1 commit into
apache:branch-4.0from
bobhan1:branch-4.0-pick-64349

Conversation

@bobhan1

@bobhan1 bobhan1 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Proposed changes

Backport #64349 to branch-4.0.

This keeps shared DeltaWriterV2 from retaining the creator sink RuntimeState after another local sink reuses the writer. The 4.0 adaptation preserves branch-local paths and keeps delta-writer profile collection gated by enable_profile() and profile_level() >= 2 at the VTabletWriterV2::close() call site.

Upstream commit: c33e1b3
Backport commit: 27c5336

Testing

  • git diff HEAD^ HEAD --check
  • ./run-be-ut.sh --run --filter=TestVTabletWriterV2.*:DeltaWriterV2PoolTest.* -j100

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

)

Issue Number: None

Problem Summary:

Shared `DeltaWriterV2` instances can be reused by multiple local sinks
from the same load. Before this change, the shared writer stored the
`RuntimeState*` from the sink that first created it. If that creator
sink finished and its `RuntimeState` was destroyed while another local
sink continued to reuse the shared writer, `DeltaWriterV2::write()`
could access the destroyed state in the memtable flush-limit
cancellation path, causing a BE crash or ASAN use-after-free.

This PR adds a BE unit test that reproduces the lifetime boundary:

- one `VTabletWriterV2` creates the shared `DeltaWriterV2`;
- the creator writer and its `RuntimeState` are destroyed without
cancelling the shared writer;
- a second writer reuses the shared writer and is forced into the
`DeltaWriterV2::write()` flush-limit wait path;
- the old code reads the destroyed creator state, while the fixed code
observes the current writer's cancel state and exits cleanly.

The fix removes the stored `RuntimeState*` from `DeltaWriterV2`. The
shared writer now keeps only the stable `WorkloadGroup` shared pointer
needed by `MemTableWriter` initialization, and `VTabletWriterV2` passes
a per-call cancel checker into `DeltaWriterV2::write()` so cancellation
is evaluated against the current sink.

Fix a possible BE crash when shared delta writers are reused by multiple
local sinks.
@bobhan1 bobhan1 force-pushed the branch-4.0-pick-64349 branch from 27c5336 to 27cb2ba Compare June 15, 2026 07:11
@bobhan1 bobhan1 changed the title [branch-4.0][fix](delta writer) Fix shared delta writer state lifetime branch-4.0: [fix](delta writer) Fix shared delta writer state lifetime Jun 15, 2026
@bobhan1 bobhan1 marked this pull request as ready for review June 15, 2026 07:12
@bobhan1 bobhan1 requested a review from morningman as a code owner June 15, 2026 07:12
@bobhan1

bobhan1 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 45.83% (11/24) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.46% (19501/36481)
Line Coverage 36.55% (182335/498844)
Region Coverage 33.13% (141635/427523)
Branch Coverage 33.96% (61244/180329)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 70.83% (17/24) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.52% (25540/35712)
Line Coverage 54.34% (270594/497970)
Region Coverage 51.96% (224377/431841)
Branch Coverage 53.36% (96558/180962)

@bobhan1

bobhan1 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

run nonConcurrent

@bobhan1

bobhan1 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review result: no blocking issues found.

Critical checkpoints:

  • Correctness/lifetime: DeltaWriterV2 no longer retains the creator sink RuntimeState; current-sink cancellation is checked synchronously during flush-limit waits. Workload-group ownership is preserved through a shared_ptr, and schema/request pointers remain owned by the shared schema object.
  • Concurrency/cancellation: shared writer map close/cancel paths still use the existing use-counting model, and the new cancel callback is not stored beyond the write() call.
  • Profiling/behavior: delta-writer profile collection remains gated by enable_profile() and profile_level() >= 2 at the VTabletWriterV2::close() call site.
  • Tests: the new regression test covers reuse of a shared writer after the creator state is destroyed; pool test updates match the constructor signature. Local check run: git diff --check.
  • Review focus: no additional user-provided review focus was supplied.

I did not find a critical blocking issue in this PR.

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 70.83% (17/24) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.53% (25545/35712)
Line Coverage 54.36% (270694/497970)
Region Coverage 51.97% (224414/431841)
Branch Coverage 53.38% (96594/180962)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants