Skip to content

refactor(uffd-tests): pre-fault uffd-registered pages via madvise to eliminate STW-deadlock class#2542

Closed
ValentaTomas wants to merge 2 commits intofeat/uffd-remove-events-matrixfrom
refactor/uffd-test-child-owned-memory
Closed

refactor(uffd-tests): pre-fault uffd-registered pages via madvise to eliminate STW-deadlock class#2542
ValentaTomas wants to merge 2 commits intofeat/uffd-remove-events-matrixfrom
refactor/uffd-test-child-owned-memory

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

Stacked on #2520. Eliminates the Go-runtime UFFD STW-deadlock class without splitting the harness further or removing parallelism.

Problem

Test goroutines in the parent read UFFD-registered memory via plain Go (bytes.Equal(memoryArea[...], ...)). When the page is missing, the kernel parks the OS thread in TASK_KILLABLE and Linux refuses to deliver SIGURG. Go's STW (e.g. triggered by a concurrent goroutine allocating during a barrier hold) waits for that thread to reach a safe point forever.

Fix

Drive the actual fault through a syscall — unix.Madvise(page, MADV_POPULATE_READ/WRITE) — so the goroutine is in _Gsyscall while the kernel is parked. The Go runtime's entersyscall transitions the goroutine to _Gsyscall before the syscall, detaching it from a P; STW does not wait on _Gsyscall goroutines, so GC completes immediately even if the kernel parks the thread inside the pagefault for arbitrarily long. Once madvise returns, the page is resident and the subsequent plain Go bytes.Equal / copy does not fault.

Why MADV_POPULATE_* and not pread/pwrite on /proc/self/mem

/proc/self/mem goes through access_remote_vm -> get_user_pages_remote with FOLL_FORCE | FOLL_REMOTE, which intentionally bypasses UFFD (so a debugger cannot deadlock on a UFFD-controlled page). Empirically, pread on a UFFD-MISSING page returns EIO. MADV_POPULATE_READ/MADV_POPULATE_WRITE (Linux 5.14+) use get_user_pages (no FOLL_REMOTE), so UFFD fires normally.

Knock-on cleanups

  • Restore t.Parallel() on TestStaleSourceRaceMissingAndRemove and TestNoMadviseDeadlockWithInflightCopy (reverts the workaround in feat(uffd): handle UFFD_EVENT_REMOVE; track per-page state; race-safe COPY #2520's caf0377dc).
  • Drop the //nolint:paralleltest,tparallel rationale on TestFaultedShortCircuitOrdering, TestRemoveThenWriteGated, and TestWriteThenRemoveGated. Pure cleanup — the hazard is gone.

Notes

  • Production code untouched.
  • Cross-process harness unchanged: parent still owns the mmap and UFFD registration, child still serves events.
  • unix.Madvise(MADV_DONTNEED) already runs in syscall mode, so REMOVE-trigger paths needed no changes.
  • The race-test direct page-content assertion (page[0] after a barrier-released read) also pre-faults via MADV_POPULATE_READ so an ad-hoc load on a removed page cannot re-introduce the deadlock.

@cursor
Copy link
Copy Markdown

cursor Bot commented May 3, 2026

PR Summary

Medium Risk
Test harness behavior changes to drive UFFD faults via unix.Madvise(MADV_POPULATE_*) and re-enables parallel execution, which could introduce CI flakiness or failures on kernels lacking MADV_POPULATE_* support.

Overview
Updates the UFFD cross-process tests to pre-fault UFFD-registered pages via unix.Madvise(MADV_POPULATE_READ/WRITE) before any direct Go reads/writes, preventing Go runtime STW/GC deadlocks caused by goroutines blocking in the kernel fault path. With faults now driven through syscalls, previously-serialized race/remove/gated tests are switched back to t.Parallel(), and one race assertion also pre-populates the page before directly reading page[0].

Reviewed by Cursor Bugbot for commit b8500e9. Bugbot is set up for automated code reviews on this repo. Configure here.

…V_POPULATE_*)

Eliminate the Go-runtime ↔ UFFD STW-deadlock class without changing the
two-process harness or production code: in executeRead/executeWrite, run
the actual fault through a syscall (MADV_POPULATE_READ / _WRITE) instead
of the implicit fault triggered by bytes.Equal / copy. Once madvise
returns the page is resident, so the subsequent Go memory ops never fault.

Why a syscall is necessary: a plain Go load against a uffd-MISSING page
parks the OS thread in TASK_KILLABLE while the runtime still considers
the goroutine _Grunning, so any concurrent STW (GC, ReadMemStats, etc.)
waits for the thread to reach a safe point — forever. Going through
unix.Madvise puts the goroutine in _Gsyscall before the kernel parks the
thread, which is preemptible by STW.

MADV_POPULATE_* (Linux 5.14+) is the correct syscall for this: its
fault path uses get_user_pages (no FOLL_REMOTE), so UFFD fires normally
— unlike pread/pwrite on /proc/self/mem, whose access_remote_vm path
sets FAULT_FLAG_REMOTE and FOLL_FORCE and intentionally bypasses UFFD
(so a debugger can't deadlock on a uffd-controlled page).

The parent's mmap and UFFDIO_REGISTER are unchanged. The cross-process
RPC layer is unchanged. No production code is touched.
The STW-deadlock hazard those tests were guarding against is gone after
the previous commit (madvise pre-faulting), so the workaround in
caf0377 and the older serialisations on TestFaultedShortCircuitOrdering,
TestRemoveThenWriteGated, and TestWriteThenRemoveGated are no longer
needed.

This:
  - Reverts caf0377 by re-adding t.Parallel() on
    TestStaleSourceRaceMissingAndRemove and
    TestNoMadviseDeadlockWithInflightCopy (both outer and inner subtests).
  - Drops the //nolint:paralleltest,tparallel comments on
    TestFaultedShortCircuitOrdering, TestRemoveThenWriteGated, and
    TestWriteThenRemoveGated and re-adds t.Parallel() on each.

The race tests now also pre-fault their direct page reads via
MADV_POPULATE_READ for the same reason: an ad-hoc page[0] load could
otherwise re-introduce the deadlock on a removed page.
@ValentaTomas ValentaTomas force-pushed the refactor/uffd-test-child-owned-memory branch from 736d083 to b8500e9 Compare May 3, 2026 03:03
@ValentaTomas
Copy link
Copy Markdown
Member Author

Folded into #2520. The two commits (madvise pre-fault + restore t.Parallel) now live there.

@ValentaTomas ValentaTomas deleted the refactor/uffd-test-child-owned-memory branch May 3, 2026 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants