feat(uffd): handle UFFD_EVENT_REMOVE; track per-page state; race-safe COPY#2520
feat(uffd): handle UFFD_EVENT_REMOVE; track per-page state; race-safe COPY#2520ValentaTomas wants to merge 4 commits intofeat/block-state-trackerfrom
Conversation
PR SummaryHigh Risk Overview Updates prefaulting to respect page state and mark prefaulted pages as Reviewed by Cursor Bugbot for commit 53f9544. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: da912303d8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
da91230 to
efa01a5
Compare
…ernel redeliver Folds audit findings #1 and #7 into one commit since they share the same error arm in faultPage. The kernel surfaces concurrent mm churn (e.g. balloon-driven madvise(MADV_DONTNEED), mremap, fork against the same mm) through UFFDIO_COPY in two distinct ways: as an EAGAIN errno from the syscall, or — once UFFD_FEATURE_EVENT_REMOVE is enabled — through the partial-copy convention where the syscall returns 0 and cpy.copy carries either -EAGAIN or 0..pagesize. Hugetlb pages can also surface a positive short copy if a fault preempts the operation mid-page (#7). Pre-#2520 the latter path went through fmt.Errorf("UFFDIO_COPY copied N bytes...") and fell into the catch-all writeErr != nil arm — which calls onFailure() / fdExit.SignalExit(), tears the uffd serve loop down, and crashes the sandbox the moment the guest touches an unmapped page. The pre-existing errno-EAGAIN soft handler covered only the syscall errno path. Move the partial-copy classification into a small helper so both surfaces collapse onto the existing EAGAIN-returning-(false, nil) branch in faultPage. No retry budget — matches Firecracker's reference handler in src/firecracker/examples/uffd/uffd_utils.rs (Err(PartiallyCopied(n)) if n == 0 || n == -EAGAIN ⇒ return false). Add a uffd.copy_eagain span attribute for observability. Tests: unit-test classifyCopyResult directly. faultPage doesn't expose an Fd seam to mock UFFDIO_COPY without an interface refactor that would materially expand the diff; per the audit's "smallest pragmatic test" guidance the classifier covers the new branching and the existing cross-process matrix tests cover the integration path.
…onrpc over unix socket (#2519) Replace the cross-process userfaultfd test harness's pipes + signals (`SIGUSR1` shutdown, `SIGUSR2` page-state snapshot, ready/offsets/gate-cmd/gate-sync pipes) with one Unix socket carrying stdlib `net/rpc` + `net/rpc/jsonrpc`. The userfaultfd and the rpc socketpair half are passed via `ExtraFiles`. Production change: one `atomic.Pointer[func(uintptr, faultPhase)]` field on `Userfaultfd` and three nil-checked inline call sites. Test builds install the hook via `SetTestFaultHook` defined in a `_test.go` file. Stacked follow-ups: - `UFFD_EVENT_REMOVE` handling + matrix tests — #2520 - Stale-source / madvise-deadlock / faulted-short-circuit race tests — #2521 - Stale-source race fix — #2512
…loon Adds the FC-side integration plumbing for free page reporting on top of the UFFD REMOVE-event handling in #2520: - template-manager proto: optional bool freePageReporting (field 17). - TemplateConfig + sandbox.Config gain a FreePageReporting bool that flows from template create → build phases (base/steps/finalize) → sandbox factory → fc.Process.Create. - fc.apiClient.enableFreePageReporting calls PUT /balloon with free_page_reporting=true after entropy setup and before VM start. - fcversion.HasFreePageReporting gates rollout to FC v1.14+. - Adds free-page-reporting LaunchDarkly feature flag. - create-build CLI: --free-page-reporting flag, defaults to enabled when FC version supports it. - smoketest: opportunistically enables FPR when the FC version under test supports it. UFFD-side changes (REMOVE handling, page tracker, race tests, fix) remain in #2520; this PR is purely the production rollout path.
…loon Adds the FC-side integration plumbing for free page reporting on top of the UFFD REMOVE-event handling in #2520: - template-manager proto: optional bool freePageReporting (field 17). - TemplateConfig + sandbox.Config gain a FreePageReporting bool that flows from template create → build phases (base/steps/finalize) → sandbox factory → fc.Process.Create. - fc.apiClient.enableFreePageReporting calls PUT /balloon with free_page_reporting=true after entropy setup and before VM start. - fcversion.HasFreePageReporting gates rollout to FC v1.14+. - Adds free-page-reporting LaunchDarkly feature flag. - create-build CLI: --free-page-reporting flag, defaults to enabled when FC version supports it. - smoketest: opportunistically enables FPR when the FC version under test supports it. UFFD-side changes (REMOVE handling, page tracker, race tests, fix) remain in #2520; this PR is purely the production rollout path.
…loon Adds the FC-side integration plumbing for free page reporting on top of the UFFD REMOVE-event handling in #2520: - template-manager proto: optional bool freePageReporting (field 17). - TemplateConfig + sandbox.Config gain a FreePageReporting bool that flows from template create → build phases (base/steps/finalize) → sandbox factory → fc.Process.Create. - fc.apiClient.enableFreePageReporting calls PUT /balloon with free_page_reporting=true after entropy setup and before VM start. - fcversion.HasFreePageReporting gates rollout to FC v1.14+. - Adds free-page-reporting LaunchDarkly feature flag. - create-build CLI: --free-page-reporting flag, defaults to enabled when FC version supports it. - smoketest: opportunistically enables FPR when the FC version under test supports it. UFFD-side changes (REMOVE handling, page tracker, race tests, fix) remain in #2520; this PR is purely the production rollout path.
…loon Adds the FC-side integration plumbing for free page reporting on top of the UFFD REMOVE-event handling in #2520: - template-manager proto: optional bool freePageReporting (field 17). - TemplateConfig + sandbox.Config gain a FreePageReporting bool that flows from template create → build phases (base/steps/finalize) → sandbox factory → fc.Process.Create. - fc.apiClient.enableFreePageReporting calls PUT /balloon with free_page_reporting=true after entropy setup and before VM start. - fcversion.HasFreePageReporting gates rollout to FC v1.14+. - Adds free-page-reporting LaunchDarkly feature flag. - create-build CLI: --free-page-reporting flag, defaults to enabled when FC version supports it. - smoketest: opportunistically enables FPR when the FC version under test supports it. UFFD-side changes (REMOVE handling, page tracker, race tests, fix) remain in #2520; this PR is purely the production rollout path.
…loon Adds the FC-side integration plumbing for free page reporting on top of the UFFD REMOVE-event handling in #2520: - template-manager proto: optional bool freePageReporting (field 17). - TemplateConfig + sandbox.Config gain a FreePageReporting bool that flows from template create → build phases (base/steps/finalize) → sandbox factory → fc.Process.Create. - fc.apiClient.enableFreePageReporting calls PUT /balloon with free_page_reporting=true after entropy setup and before VM start. - fcversion.HasFreePageReporting gates rollout to FC v1.14+. - Adds free-page-reporting LaunchDarkly feature flag. - create-build CLI: --free-page-reporting flag, defaults to enabled when FC version supports it. - smoketest: opportunistically enables FPR when the FC version under test supports it. UFFD-side changes (REMOVE handling, page tracker, race tests, fix) remain in #2520; this PR is purely the production rollout path.
…state Replace the map-based pageTracker with block.StateTracker[pageState], a roaring-bitmap-backed tracker with O(1) range ops. pageState gains a third value, removed, which is wired at the type level but not yet written anywhere -- #2520 adds the REMOVE-event handler that produces it. Page indices are computed at the call site via header.BlockIdx. pageStateEntries is updated to iterate the exported bitmaps so the cross-process test harness keeps working.
68037d2 to
ceec7d6
Compare
…loon Adds the FC-side integration plumbing for free page reporting on top of the UFFD REMOVE-event handling in #2520: - template-manager proto: optional bool freePageReporting (field 17). - TemplateConfig + sandbox.Config gain a FreePageReporting bool that flows from template create → build phases (base/steps/finalize) → sandbox factory → fc.Process.Create. - fc.apiClient.enableFreePageReporting calls PUT /balloon with free_page_reporting=true after entropy setup and before VM start. - fcversion.HasFreePageReporting gates rollout to FC v1.14+. - Adds free-page-reporting LaunchDarkly feature flag. - create-build CLI: --free-page-reporting flag, defaults to enabled when FC version supports it. - smoketest: opportunistically enables FPR when the FC version under test supports it. UFFD-side changes (REMOVE handling, page tracker, race tests, fix) remain in #2520; this PR is purely the production rollout path.
…state Replace the map-based pageTracker with block.StateTracker[pageState], a roaring-bitmap-backed tracker with O(1) range ops. pageState gains a third value, removed, which is wired at the type level but not yet written anywhere -- #2520 adds the REMOVE-event handler that produces it. Page indices are computed at the call site via header.BlockIdx. pageStateEntries is updated to iterate the exported bitmaps so the cross-process test harness keeps working. Inline the 3-line pageState enum into userfaultfd.go and drop the dedicated page_tracker.go now that pageTracker is gone. Convert block.StateTracker's NewStateTracker / SetRange API from panics to errors. Distinct-state validation and unsupported-state checks now return fmt.Errorf descriptors; the userfaultfd-side init propagates the constructor error through NewUserfaultfdFromFd, and the SetRange call in the worker path logs and continues since these errors only fire on programming bugs.
95659d3 to
bf4fc62
Compare
ceec7d6 to
9ce0941
Compare
…loon Adds the FC-side integration plumbing for free page reporting on top of the UFFD REMOVE-event handling in #2520: - template-manager proto: optional bool freePageReporting (field 17). - TemplateConfig + sandbox.Config gain a FreePageReporting bool that flows from template create → build phases (base/steps/finalize) → sandbox factory → fc.Process.Create. - fc.apiClient.enableFreePageReporting calls PUT /balloon with free_page_reporting=true after entropy setup and before VM start. - fcversion.HasFreePageReporting gates rollout to FC v1.14+. - Adds free-page-reporting LaunchDarkly feature flag. - create-build CLI: --free-page-reporting flag, defaults to enabled when FC version supports it. - smoketest: opportunistically enables FPR when the FC version under test supports it. UFFD-side changes (REMOVE handling, page tracker, race tests, fix) remain in #2520; this PR is purely the production rollout path.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9ce09411f0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Production:
- UFFDIO_REGISTER_MODE_REMOVE is requested so the kernel reports
MADV_DONTNEED'd pages via UFFD_EVENT_REMOVE.
- Userfaultfd.Serve splits read events into removes + pagefaults,
drains the REMOVE batch under settleRequests.Lock (calling
pageTracker.SetRange(.., removed) with BlockIdx-computed indices),
then dispatches the pagefault batch.
- Worker dispatch switches on pageTracker.Get(idx): faulted ->
short-circuit, removed -> zero-fill (source = nil), missing ->
copy from u.src. The state read happens inside the worker under
settleRequests.RLock so a concurrent REMOVE can't slip between
the read and the install.
- faultPage gains zero-fill paths for source == nil (4K read =
DONTWAKE zero + WP + wake; 4K write = zero + wake; hugepage =
copy(EmptyHugePage)) and returns (handled, err) so the worker can
defer UFFDIO_COPY EAGAIN back into a deferredFaults queue.
- wakeupPipe + deferredFaults wake the poll loop when a worker
defers, so a deferred fault doesn't sit waiting for an unrelated
UFFD event. The received uffd fd is marked FD_CLOEXEC.
- Prefault short-circuits for faulted || removed.
Tests:
- testConfig gains removeEnabled; the parent unregisters the UFFD
region on cleanup when REMOVE is on so munmap doesn't block on
un-acked events.
- Page-state wire format exposes removed via helpers_test.go.
- operationModeRemove + executeRemove (madvise MADV_DONTNEED).
- runMatrix wraps every existing generic test in remove-off and
remove-on subtests so the no-REMOVE path (still used by
production templates) stays covered while the new path is
exercised. The matrix-level t.Parallel() is intentionally
omitted to cap peak concurrency in CI.
- remove_test.go: TestRemove, TestRemoveThenFault,
TestRemoveThenWriteGated, TestWriteThenRemoveGated. Gated tests
are //nolint:tparallel — a paused gated handler keeps a faulting
goroutine suspended in the kernel pagefault path; a STW GC pause
from a parallel test would wait forever for that goroutine to
reach a safe point.
- race_test.go: deterministic stale-source / madvise-deadlock /
faulted-short-circuit regressions, serialised, with the
FD_CLOEXEC and UFFDIO_COPY-EAGAIN fixes covered.
…state Replace the map-based pageTracker with block.StateTracker[pageState], a roaring-bitmap-backed tracker with O(1) range ops. pageState gains a third value, removed, which is wired at the type level but not yet written anywhere -- #2520 adds the REMOVE-event handler that produces it. Page indices are computed at the call site via header.BlockIdx. pageStateEntries is updated to iterate the exported bitmaps so the cross-process test harness keeps working. Inline the 3-line pageState enum into userfaultfd.go and drop the dedicated page_tracker.go now that pageTracker is gone. Convert block.StateTracker's NewStateTracker / SetRange API from panics to errors. Distinct-state validation and unsupported-state checks now return fmt.Errorf descriptors; the userfaultfd-side init propagates the constructor error through NewUserfaultfdFromFd, and the SetRange call in the worker path logs and continues since these errors only fire on programming bugs.
bf4fc62 to
02f8da8
Compare
9ce0941 to
5896b7b
Compare
…loon Adds the FC-side integration plumbing for free page reporting on top of the UFFD REMOVE-event handling in #2520: - template-manager proto: optional bool freePageReporting (field 17). - TemplateConfig + sandbox.Config gain a FreePageReporting bool that flows from template create → build phases (base/steps/finalize) → sandbox factory → fc.Process.Create. - fc.apiClient.enableFreePageReporting calls PUT /balloon with free_page_reporting=true after entropy setup and before VM start. - fcversion.HasFreePageReporting gates rollout to FC v1.14+. - Adds free-page-reporting LaunchDarkly feature flag. - create-build CLI: --free-page-reporting flag, defaults to enabled when FC version supports it. - smoketest: opportunistically enables FPR when the FC version under test supports it. UFFD-side changes (REMOVE handling, page tracker, race tests, fix) remain in #2520; this PR is purely the production rollout path.
…loon Adds the FC-side integration plumbing for free page reporting on top of the UFFD REMOVE-event handling in #2520: - template-manager proto: optional bool freePageReporting (field 17). - TemplateConfig + sandbox.Config gain a FreePageReporting bool that flows from template create → build phases (base/steps/finalize) → sandbox factory → fc.Process.Create. - fc.apiClient.enableFreePageReporting calls PUT /balloon with free_page_reporting=true after entropy setup and before VM start. - fcversion.HasFreePageReporting gates rollout to FC v1.14+. - Adds free-page-reporting LaunchDarkly feature flag. - create-build CLI: --free-page-reporting flag, defaults to enabled when FC version supports it. - smoketest: opportunistically enables FPR when the FC version under test supports it. UFFD-side changes (REMOVE handling, page tracker, race tests, fix) remain in #2520; this PR is purely the production rollout path.
…loon Adds the FC-side integration plumbing for free page reporting on top of the UFFD REMOVE-event handling in #2520: - template-manager proto: optional bool freePageReporting (field 17). - TemplateConfig + sandbox.Config gain a FreePageReporting bool that flows from template create → build phases (base/steps/finalize) → sandbox factory → fc.Process.Create. - fc.apiClient.enableFreePageReporting calls PUT /balloon with free_page_reporting=true after entropy setup and before VM start. - fcversion.HasFreePageReporting gates rollout to FC v1.14+. - Adds free-page-reporting LaunchDarkly feature flag. - create-build CLI: --free-page-reporting flag, defaults to enabled when FC version supports it. - smoketest: opportunistically enables FPR when the FC version under test supports it. UFFD-side changes (REMOVE handling, page tracker, race tests, fix) remain in #2520; this PR is purely the production rollout path.
A worker holding settleRequests.RLock must never block readEvents, because madvise(MADV_DONTNEED) blocks the producer until userspace reads the UFFD_EVENT_REMOVE — and the producer can be the FC balloon thread that other syscalls depend on. Use a dedicated readSerial mutex (not settleRequests) to serialize serve-loop iterations with snapshot-time Export, while keeping the existing settleRequests discipline (workers RLock, REMOVE batch Lock) intact so readEvents remains lock-free relative to workers. Restores liveness for TestNoMadviseDeadlockWithInflightCopy while closing the read-vs-apply race that motivated the prior buggy commit (345f7e9, now amended).
345f7e9 to
db978d4
Compare
…loon Adds the FC-side integration plumbing for free page reporting on top of the UFFD REMOVE-event handling in #2520: - template-manager proto: optional bool freePageReporting (field 17). - TemplateConfig + sandbox.Config gain a FreePageReporting bool that flows from template create → build phases (base/steps/finalize) → sandbox factory → fc.Process.Create. - fc.apiClient.enableFreePageReporting calls PUT /balloon with free_page_reporting=true after entropy setup and before VM start. - fcversion.HasFreePageReporting gates rollout to FC v1.14+. - Adds free-page-reporting LaunchDarkly feature flag. - create-build CLI: --free-page-reporting flag, defaults to enabled when FC version supports it. - smoketest: opportunistically enables FPR when the FC version under test supports it. UFFD-side changes (REMOVE handling, page tracker, race tests, fix) remain in #2520; this PR is purely the production rollout path.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 53f9544. Configure here.

Handles
UFFD_EVENT_REMOVEso balloon-deflate /madvise(MADV_DONTNEED)transitions pages to aremovedstate instead of leaving stalefaultedmappings. Drains the REMOVE batch undersettleRequests.Lock; workers holdsettleRequests.RLockacross the state read →UFFDIO_COPY→SetRangesequence so a concurrent REMOVE can't slip between the read and the install. Soft-failsUFFDIO_COPYEAGAIN / partial copies onto adeferredFaultsqueue and wakes the poll loop via a self-pipe.Depends on #2545.