test(uffd): rapid state-machine + chaos-source coverage#2544
test(uffd): rapid state-machine + chaos-source coverage#2544ValentaTomas wants to merge 5 commits intofeat/uffd-remove-events-matrixfrom
Conversation
Upgrades the indirect pin of rapid from v1.2.0 to the direct dependency v1.3.0 needed by the new TestRapidStateMachine property-based suite. rapid does NOT appear in non-test builds (verified via go list -deps).
Adds TestRapidStateMachine: a pgregory.net/rapid property-based fuzzer that drives random read / write / madvise(MADV_DONTNEED) sequences against a live Userfaultfd handler running in-process (no child RPC server). The state machine tracks 4 model states (missing / faulted / zero-faulted / removed) and validates per-action invariants after each step: - read/write on missing/faulted pages: state → faulted, content = source data - read/write on removed pages: state → zero-faulted, content = zero fill - madvise on faulted pages: state → removed (waits for REMOVE event drain) Both pagesize arms (4 KiB, 2 MiB hugepage) × both removeEnabled modes run as parallel subtests via runMatrix. Also adds directHandler — a lightweight in-process UFFD harness used by both the rapid and chaos tests. It creates the mmap, configures the fd, and runs Userfaultfd.Serve in a background goroutine; t.Cleanup handles teardown. Reproduce a failure: RAPID_SEED=<seed> go test -run=TestRapidStateMachine ./pkg/sandbox/uffd/userfaultfd/...
Wraps block.Slicer with a seedable chaosSource that injects uniform
random [0, 50ms] latency per Slice call, fires 64 concurrent
MADV_POPULATE_READ goroutines, triggers Shutdown, and asserts
teardown completes within 5 seconds.
Guards Close↔Serve drain regressions where a slow in-flight worker
would otherwise wedge the wg.Wait() drain in the Serve exit path.
Both pagesize arms × both removeEnabled modes run as parallel subtests
via runMatrix. Seed is logged at test start; override with
UFFD_CHAOS_SEED=<n>.
Reproduce a failure:
UFFD_CHAOS_SEED=<seed> go test -run=TestChaosCloseTerminatesUnderLatency \
./pkg/sandbox/uffd/userfaultfd/...
PR SummaryLow Risk Overview Reviewed by Cursor Bugbot for commit c28fc1e. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
Folded into #2520. Rapid + chaos tests now use the existing cross-process testharness instead of an in-process handler. |
Stacks on
feat/uffd-remove-events-matrix. Adds two seedable test suites that meaningfully expand coverage. Will rebase ontorefactor/uffd-test-child-owned-memoryonce it lands (that branch is the proper base because both tests uset.Parallel()pressure, which is only safe with the syscall-based STW fix it carries).Architecture note
Both new tests use a direct parent-side handler (
directHandler) rather than the cross-process RPC harness. This means:Servegoroutine all live in the test process.Serve).pageStateEntries()is called directly on*Userfaultfdwithout RPC round-trips.serveDone chan struct{}lets both the test body and t.Cleanup wait concurrently without consuming the channel.TestRapidStateMachine
Property-based state-machine fuzzer using
pgregory.net/rapid v1.3.0. Drives random sequences ofread / write / madvise(MADV_DONTNEED)actions against a live handler; the model tracks per-page state (missing/faulted/zero-faulted/removed). The 4-state model correctly handles remove+re-fault pages that are zero-filled (not from source). Rapid shrinks any failing sequence to a minimal counterexample.Both pagesize arms (4 KiB, 2 MiB) × both removeEnabled modes run as parallel subtests via
runMatrix.Reproduce a failure:
TestChaosCloseTerminatesUnderLatency
Wraps
block.Slicerwith achaosSourcethat injects uniform random[0, 50ms]latency perSlicecall. Fires 64 concurrentMADV_POPULATE_READgoroutines, then triggersShutdownand asserts teardown completes within 5s. CatchesClose↔Servedrain ordering regressions where a slow worker would otherwise wedgewg.Wait().After
Servereturns, the UFFD fd is closed so any goroutines still blocked on unresolved faults receive EFAULT and unblock gracefully.Both pagesize arms × both removeEnabled modes run as parallel subtests via
runMatrix.Reproduce a failure:
Notes
pgregory.net/rapidis added as a direct test-only dep at v1.3.0; verified it does not appear in non-test builds (go list -deps ./... | grep pgregoryreturns nothing).t.Parallel()on all subtests. The direct in-process handler avoids the Go-runtime↔UFFD STW deadlock that plagued the cross-process tests (all page faults now happen viaMADV_POPULATE_READ/WRITEin_Gsyscall, which is always at a GC safe point).UFFD_APIsyscall). Without root, they skip gracefully.sudo go test -race -count=3 -timeout=180s -run='TestRapidStateMachine|TestChaosCloseTerminatesUnderLatency' ./pkg/sandbox/uffd/userfaultfd/...— PASS (3.9s).