Skip to content

Commit e5274b5

Browse files
committed
fix(uffd): re-read page state inside worker goroutine under settleRequests.RLock
The Serve() loop previously read pageTracker state and captured `source = u.src` in the parent loop, then dispatched a worker goroutine. A REMOVE event for the same page that arrived between the state read and the worker actually acquiring settleRequests.RLock() would silently leave the worker with a stale `source = u.src` snapshot. The worker would then UFFDIO_COPY src bytes into a page the kernel had just MADV_DONTNEED'd, leaving pageTracker == removed and the kernel page mapped with stale src data — and observably deadlocking parent madvise() in the orchestrator unit-test suite. Move the state lookup and source capture inside the goroutine, after RLock(). The read+act+commit sequence is now atomic with respect to the REMOVE batch (which takes settleRequests.Lock()). Makes the three deterministic race tests added in the parent PR pass: - TestStaleSourceRaceMissingAndRemove (the one that intentionally failed on the parent PR with `page 1 first byte: want 0 ... got 0xc3`) - TestNoMadviseDeadlockWithInflightCopy (already passed; now stays green) - TestFaultedShortCircuitOrdering (already passed; now stays green) Soak: -count=20 -timeout=30s passes deterministically on this branch.
1 parent 1e978ee commit e5274b5

1 file changed

Lines changed: 24 additions & 22 deletions

File tree

packages/orchestrator/pkg/sandbox/uffd/userfaultfd/userfaultfd.go

Lines changed: 24 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -323,36 +323,38 @@ func (u *Userfaultfd) Serve(
323323
return fmt.Errorf("failed to map: %w", err)
324324
}
325325

326-
// State read happens before the worker takes settleRequests.RLock,
327-
// so a REMOVE arriving in the parent's next iteration can race
328-
// with an already-scheduled worker. Tracked as a known race; fix
329-
// re-reads state under RLock in the worker.
330-
var source block.Slicer
331-
switch state := u.pageTracker.get(addr); state {
332-
case faulted:
333-
// Already mapped (prefault or earlier fault in this batch).
334-
// Only a UFFD_EVENT_REMOVE can transition out of `faulted`;
335-
// the used pages must not be swappable for this to hold.
336-
continue
337-
case removed:
338-
// Zero-fill: source stays nil.
339-
case missing:
340-
source = u.src
341-
default:
342-
return fmt.Errorf("unexpected pageState: %#v", state)
343-
}
344-
345326
u.wg.Go(func() error {
346327
if h := u.testFaultHook.Load(); h != nil {
347328
(*h)(addr, faultPhaseBeforeRLock)
348329
}
349330

350-
// RLock inside the goroutine so RUnlock runs via defer even on
351-
// early return, and so it pairs with the prefetchTracker write
352-
// below.
331+
// RLock must be inside the goroutine so RUnlock runs via defer.
332+
// The state read below MUST happen after RLock: the read+act+commit
333+
// sequence (state lookup → faultPage → setState(faulted)) must be
334+
// atomic with any concurrent REMOVE batch (settleRequests.Lock()).
335+
// A state read in the parent loop would leave a window where a REMOVE
336+
// lands after the read but before RLock, and the worker would
337+
// overwrite `removed` with `faulted`.
353338
u.settleRequests.RLock()
354339
defer u.settleRequests.RUnlock()
355340

341+
var source block.Slicer
342+
343+
switch state := u.pageTracker.get(addr); state {
344+
case faulted:
345+
// Already mapped; only UFFD_EVENT_REMOVE transitions out of
346+
// faulted. Pages must not be swappable for this to hold.
347+
return nil
348+
case removed:
349+
// Zero-fill. The kernel still expects an UFFDIO_COPY/ZEROPAGE
350+
// ack for the original MISSING fault, otherwise the faulting
351+
// thread stays blocked.
352+
case missing:
353+
source = u.src
354+
default:
355+
return fmt.Errorf("unexpected pageState: %#v", state)
356+
}
357+
356358
var accessType block.AccessType
357359
if pf.flags&UFFD_PAGEFAULT_FLAG_WRITE == 0 {
358360
accessType = block.Read

0 commit comments

Comments
 (0)