fix(uffd): read page state inside worker under settleRequests.RLock#2512
fix(uffd): read page state inside worker under settleRequests.RLock#2512ValentaTomas wants to merge 2 commits intotest/uffd-stale-source-race-testsfrom
Conversation
PR SummaryHigh Risk Overview Reviewed by Cursor Bugbot for commit a794b82. Bugbot is set up for automated code reviews on this repo. Configure here. |
…e from feat/free-page-reporting Lift the production-side UFFD subsystem closure that the upcoming RPC test harness and race tests depend on, taken verbatim from the tip of feat/free-page-reporting (commit f310273). The closure brings in: - pageTracker gains the `removed` pageState - Userfaultfd.Serve() splits readEvents into removes + pagefaults, drains the REMOVE batch under settleRequests.Lock(), then dispatches the pagefault batch - Worker fault dispatch with switch on pageTracker state (note: the state-snapshot read happens in the parent loop here — that is the bug the stacked PR #2512 fixes) - wakeupPipe self-pipe to wake the poll loop when a goroutine defers a page fault - defaultCopyMode override, MapMemoryReadyAddr lifecycle bits - deferred.go (deferred pagefault list) - prefault.go updates for prefetch / WP coordination - remove_test.go (REMOVE-event integration tests) - cross_process_helpers_test.go / helpers_test.go / fd_helpers_test.go REMOVE-aware variants This commit is purely a state lift of packages/orchestrator/pkg/sandbox/uffd/userfaultfd/ — no other paths touched, no Firecracker bumps, no feature flags, no proto regen, no template-manager / API plumbing. Required so that test/uffd-rpc-harness-and-race-tests can target main (via feat/uffd-test-scaffolding) instead of being stacked on top of the full free-page-reporting feature PR (#1896).
40be150 to
763945a
Compare
51664dd to
562b16b
Compare
removed state, and deterministic race tests
#2513
562b16b to
7ad9154
Compare
…trix-mode tests
Production:
- pageState gains `removed`; pageTracker gains `get(addr)`.
- Userfaultfd.Serve() splits readEvents into removes + pagefaults,
drains the REMOVE batch under settleRequests.Lock(), then
dispatches the pagefault batch.
- Worker dispatch switches on pageTracker.get(addr): faulted ->
short-circuit, removed -> zero-fill (source = nil), missing ->
copy from u.src. NOTE: the state read and `source` capture
happen in the parent loop BEFORE the worker takes
settleRequests.RLock(). This is the buggy shape that PR #4
fixes; PR #3 adds the deterministic race tests that catch it.
- faultPage gains zero-fill paths for source == nil (4K read =
DONTWAKE zero + WP + wake; 4K write = zero + wake; hugepage =
copy(EmptyHugePage)) and returns (handled bool, err error) so
the worker can defer EAGAIN-on-COPY-during-REMOVE faults.
- wakeupPipe + deferredFaults wake the poll loop when a worker
defers a fault.
- Prefault path checks pageTracker for faulted || removed and
short-circuits.
Tests:
- testConfig gains `removeEnabled bool`; configureApi optionally
enables UFFD_FEATURE_EVENT_REMOVE based on it; the parent
cleanup unregisters the UFFD region when REMOVE is on so munmap
doesn't block on un-acked events.
- PageStates RPC + handlerPageStates now expose `removed`.
- operationModeRemove + executeRemove (madvise MADV_DONTNEED).
- runMatrix(t, tt, body) wraps every existing generic test in two
parallel subtests: remove-off (regression for the no-REMOVE
path that production templates still use) and remove-on
(covers the new code path).
- remove_test.go: REMOVE-specific TestRemove, TestRemoveThenFault,
TestRemoveThenWriteGated, TestWriteThenRemoveGated. Gated tests
are nolint'd as serialised - a paused gated handler keeps a
faulting goroutine suspended in the kernel pagefault path; a
STW GC pause from a parallel test would wait forever for that
goroutine to reach a safe point.
Out of scope (lives in stacked PRs):
- Race tests demonstrating the stale-source bug -> PR #3.
- The fix moving state read into the worker -> PR #4 (#2512).
…ed-short-circuit race tests
Three race tests built on the unix-socket RPC harness and the test-only
fault-barrier hooks. None use sleeps, retries, or soak loops - each
test installs explicit barriers on the child's worker goroutine, drives
the racing kernel operation from the parent, and asserts on a concrete
post-state.
- TestStaleSourceRaceMissingAndRemove: regression test for the
stale-source bug. Plants a non-zero sentinel into the source page,
parks the worker via barrierBeforeRLock, fires madvise, waits for
the REMOVE batch to commit, releases the worker, then asserts the
page is zero-filled. INTENTIONALLY FAILS on this PR with
`page 1 first byte: want 0 ... got 0xc3` - the worker captured
`source = u.src` in the parent loop before the REMOVE landed and
UFFDIO_COPY'd the planted sentinel into the page after the kernel
had MADV_DONTNEED'd it. PR #4 (#2512) makes this pass by re-reading
state inside the worker under settleRequests.RLock.
- TestNoMadviseDeadlockWithInflightCopy: liveness regression test.
Parks the worker via barrierBeforeFaultPage (holding RLock), fires
madvise, asserts madvise returns within 2s. Passes today; protects
against any future change that accidentally couples readEvents to
settleRequests.
- TestFaultedShortCircuitOrdering: smoke test on the REMOVE-then-
pagefault batch ordering using the gated harness. Pins the
invariant that REMOVE batches drain before pagefault dispatch in
a single Serve iteration.
Test infrastructure additions:
- testHandler.installFaultBarrier / waitFaultHeld / releaseFault
convenience wrappers around the Service.* RPCs from PR #1.
- testConfig.sourcePatcher hook so race tests can plant a
deterministic sentinel into the random source data BEFORE the
content file is written, without depending on the happenstance
value of any randomly-generated byte.
ALL OTHER TESTS in the package still pass on this PR; only the three
sub-tests of TestStaleSourceRaceMissingAndRemove fail (the bug
demonstration).
e744f88 to
f8f9d6e
Compare
7ad9154 to
84b9c5d
Compare
…ed-short-circuit race tests
Three race tests built on the unix-socket RPC harness and the test-only
fault-barrier hooks. None use sleeps, retries, or soak loops - each
test installs explicit barriers on the child's worker goroutine, drives
the racing kernel operation from the parent, and asserts on a concrete
post-state.
- TestStaleSourceRaceMissingAndRemove: regression test for the
stale-source bug. Plants a non-zero sentinel into the source page,
parks the worker via barrierBeforeRLock, fires madvise, waits for
the REMOVE batch to commit, releases the worker, then asserts the
page is zero-filled. INTENTIONALLY FAILS on this PR with
`page 1 first byte: want 0 ... got 0xc3` - the worker captured
`source = u.src` in the parent loop before the REMOVE landed and
UFFDIO_COPY'd the planted sentinel into the page after the kernel
had MADV_DONTNEED'd it. PR #4 (#2512) makes this pass by re-reading
state inside the worker under settleRequests.RLock.
- TestNoMadviseDeadlockWithInflightCopy: liveness regression test.
Parks the worker via barrierBeforeFaultPage (holding RLock), fires
madvise, asserts madvise returns within 2s. Passes today; protects
against any future change that accidentally couples readEvents to
settleRequests.
- TestFaultedShortCircuitOrdering: smoke test on the REMOVE-then-
pagefault batch ordering using the gated harness. Pins the
invariant that REMOVE batches drain before pagefault dispatch in
a single Serve iteration.
Test infrastructure additions:
- testHandler.installFaultBarrier / waitFaultHeld / releaseFault
convenience wrappers around the Service.* RPCs from PR #1.
- testConfig.sourcePatcher hook so race tests can plant a
deterministic sentinel into the random source data BEFORE the
content file is written, without depending on the happenstance
value of any randomly-generated byte.
ALL OTHER TESTS in the package still pass on this PR; only the three
sub-tests of TestStaleSourceRaceMissingAndRemove fail (the bug
demonstration).
- waitForState: add default case to avoid silent busy-poll on unrecognised pageState values. - TestFaultedShortCircuitOrdering: rewrite docstring to accurately describe coverage (disjoint-page end-state check, not an ordering invariant guard; same-page ordering is covered by TestStaleSource...). - TestStaleSourceRaceMissingAndRemove: fix "MISSING-write fault" stale docstring to "MISSING (READ) fault", note both variants fail until #2512. - Trim verbose multi-line constant and helper comments down to load-bearing WHY.
f8f9d6e to
1e978ee
Compare
84b9c5d to
e5274b5
Compare
…ed-short-circuit race tests
Three race tests built on the unix-socket RPC harness and the test-only
fault-barrier hooks. None use sleeps, retries, or soak loops - each
test installs explicit barriers on the child's worker goroutine, drives
the racing kernel operation from the parent, and asserts on a concrete
post-state.
- TestStaleSourceRaceMissingAndRemove: regression test for the
stale-source bug. Plants a non-zero sentinel into the source page,
parks the worker via barrierBeforeRLock, fires madvise, waits for
the REMOVE batch to commit, releases the worker, then asserts the
page is zero-filled. INTENTIONALLY FAILS on this PR with
`page 1 first byte: want 0 ... got 0xc3` - the worker captured
`source = u.src` in the parent loop before the REMOVE landed and
UFFDIO_COPY'd the planted sentinel into the page after the kernel
had MADV_DONTNEED'd it. PR #4 (#2512) makes this pass by re-reading
state inside the worker under settleRequests.RLock.
- TestNoMadviseDeadlockWithInflightCopy: liveness regression test.
Parks the worker via barrierBeforeFaultPage (holding RLock), fires
madvise, asserts madvise returns within 2s. Passes today; protects
against any future change that accidentally couples readEvents to
settleRequests.
- TestFaultedShortCircuitOrdering: smoke test on the REMOVE-then-
pagefault batch ordering using the gated harness. Pins the
invariant that REMOVE batches drain before pagefault dispatch in
a single Serve iteration.
Test infrastructure additions:
- testHandler.installFaultBarrier / waitFaultHeld / releaseFault
convenience wrappers around the Service.* RPCs from PR #1.
- testConfig.sourcePatcher hook so race tests can plant a
deterministic sentinel into the random source data BEFORE the
content file is written, without depending on the happenstance
value of any randomly-generated byte.
ALL OTHER TESTS in the package still pass on this PR; only the three
sub-tests of TestStaleSourceRaceMissingAndRemove fail (the bug
demonstration).
- waitForState: add default case to avoid silent busy-poll on unrecognised pageState values. - TestFaultedShortCircuitOrdering: rewrite docstring to accurately describe coverage (disjoint-page end-state check, not an ordering invariant guard; same-page ordering is covered by TestStaleSource...). - TestStaleSourceRaceMissingAndRemove: fix "MISSING-write fault" stale docstring to "MISSING (READ) fault", note both variants fail until #2512. - Trim verbose multi-line constant and helper comments down to load-bearing WHY.
1e978ee to
3047e69
Compare
e5274b5 to
aa19fdc
Compare
…uests.RLock
The Serve() loop previously read pageTracker state and captured
`source = u.src` in the parent loop, then dispatched a worker
goroutine. A REMOVE event for the same page that arrived between the
state read and the worker actually acquiring settleRequests.RLock()
would silently leave the worker with a stale `source = u.src`
snapshot. The worker would then UFFDIO_COPY src bytes into a page the
kernel had just MADV_DONTNEED'd, leaving pageTracker == removed and
the kernel page mapped with stale src data — and observably
deadlocking parent madvise() in the orchestrator unit-test suite.
Move the state lookup and source capture inside the goroutine, after
RLock(). The read+act+commit sequence is now atomic with respect to
the REMOVE batch (which takes settleRequests.Lock()).
Makes the three deterministic race tests added in the parent PR pass:
- TestStaleSourceRaceMissingAndRemove (the one that intentionally
failed on the parent PR with `page 1 first byte: want 0 ... got 0xc3`)
- TestNoMadviseDeadlockWithInflightCopy (already passed; now stays green)
- TestFaultedShortCircuitOrdering (already passed; now stays green)
Soak: -count=20 -timeout=30s passes deterministically on this branch.
aa19fdc to
26be119
Compare
…onrpc over unix socket (#2519) Replace the cross-process userfaultfd test harness's pipes + signals (`SIGUSR1` shutdown, `SIGUSR2` page-state snapshot, ready/offsets/gate-cmd/gate-sync pipes) with one Unix socket carrying stdlib `net/rpc` + `net/rpc/jsonrpc`. The userfaultfd and the rpc socketpair half are passed via `ExtraFiles`. Production change: one `atomic.Pointer[func(uintptr, faultPhase)]` field on `Userfaultfd` and three nil-checked inline call sites. Test builds install the hook via `SetTestFaultHook` defined in a `_test.go` file. Stacked follow-ups: - `UFFD_EVENT_REMOVE` handling + matrix tests — #2520 - Stale-source / madvise-deadlock / faulted-short-circuit race tests — #2521 - Stale-source race fix — #2512
…ed-short-circuit race tests
Three race tests built on the unix-socket RPC harness and the test-only
fault-barrier hooks. None use sleeps, retries, or soak loops - each
test installs explicit barriers on the child's worker goroutine, drives
the racing kernel operation from the parent, and asserts on a concrete
post-state.
- TestStaleSourceRaceMissingAndRemove: regression test for the
stale-source bug. Plants a non-zero sentinel into the source page,
parks the worker via barrierBeforeRLock, fires madvise, waits for
the REMOVE batch to commit, releases the worker, then asserts the
page is zero-filled. INTENTIONALLY FAILS on this PR with
`page 1 first byte: want 0 ... got 0xc3` - the worker captured
`source = u.src` in the parent loop before the REMOVE landed and
UFFDIO_COPY'd the planted sentinel into the page after the kernel
had MADV_DONTNEED'd it. PR #4 (#2512) makes this pass by re-reading
state inside the worker under settleRequests.RLock.
- TestNoMadviseDeadlockWithInflightCopy: liveness regression test.
Parks the worker via barrierBeforeFaultPage (holding RLock), fires
madvise, asserts madvise returns within 2s. Passes today; protects
against any future change that accidentally couples readEvents to
settleRequests.
- TestFaultedShortCircuitOrdering: smoke test on the REMOVE-then-
pagefault batch ordering using the gated harness. Pins the
invariant that REMOVE batches drain before pagefault dispatch in
a single Serve iteration.
Test infrastructure additions:
- testHandler.installFaultBarrier / waitFaultHeld / releaseFault
convenience wrappers around the Service.* RPCs from PR #1.
- testConfig.sourcePatcher hook so race tests can plant a
deterministic sentinel into the random source data BEFORE the
content file is written, without depending on the happenstance
value of any randomly-generated byte.
ALL OTHER TESTS in the package still pass on this PR; only the three
sub-tests of TestStaleSourceRaceMissingAndRemove fail (the bug
demonstration).
- waitForState: add default case to avoid silent busy-poll on unrecognised pageState values. - TestFaultedShortCircuitOrdering: rewrite docstring to accurately describe coverage (disjoint-page end-state check, not an ordering invariant guard; same-page ordering is covered by TestStaleSource...). - TestStaleSourceRaceMissingAndRemove: fix "MISSING-write fault" stale docstring to "MISSING (READ) fault", note both variants fail until #2512. - Trim verbose multi-line constant and helper comments down to load-bearing WHY.
Summary
Move the
pageTracker.get(addr)switch and thesource = u.srccapture from the parentServe()loop into the worker goroutine, aftersettleRequests.RLock(). Read + act + commit is now atomic with respect to the REMOVE batch.This PR is deliberately split from #2521 so each PR has a single concern: #2521 lands the deterministic regression tests (the red CI is the bug demo); this PR lands the one-file production fix on top and turns those tests green.
Changes
userfaultfd.goonly (+35 / -27). Conceptually three lines moved and onecontinueturned intoreturn nilto fit the goroutine closure.Root cause
Pre-fix, a
UFFD_EVENT_REMOVEarriving between the parent loop readingpageTracker.get(addr)/ capturingsource = u.srcand the worker getting onto a CPU would commitremovedundersettleRequests.Lock()first. The worker would thenUFFDIO_COPYthe stale source bytes into the page the kernel had justMADV_DONTNEED'd and overwriteremovedwithfaulted. User-visible symptoms: orchestrator parentmadvise()blocking forever and pages reappearing with stale src bytes after a remove.Stack
test/uffd-stale-source-race-tests(test(uffd): add deterministic UFFD stale-source race tests #2521)mainTest plan
go build ./...go vet ./pkg/sandbox/uffd/...golangci-lint run ./pkg/sandbox/uffd/userfaultfd/... ./pkg/sandbox/uffd/testutils/...sudo GOMAXPROCS=2 go test -race -timeout 15m -count=1 ./pkg/sandbox/uffd/userfaultfd/...-- all race tests added by test(uffd): add deterministic UFFD stale-source race tests #2521 now pass, includingTestStaleSourceRaceMissingAndRemove/{4k,hugepage}.sudo go test -count=20 -timeout=30s -run 'TestStaleSourceRaceMissingAndRemove|TestNoMadviseDeadlockWithInflightCopy|TestFaultedShortCircuitOrdering' ./pkg/sandbox/uffd/userfaultfd/...-- deterministic pass.