feat(uffd,fc): balloon free-page-hinting + envd reclaim on pause#2550
Conversation
PR SummaryMedium Risk Overview Reviewed by Cursor Bugbot for commit 7efc0d9. Bugbot is set up for automated code reviews on this repo. Configure here. |
cfb09ae to
506542b
Compare
67032c5 to
3fe4149
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 3fe4149. Configure here.
| // acknowledges or ctx fires. No-op when the balloon wasn't installed. | ||
| func (p *Process) DrainBalloon(ctx context.Context) error { | ||
| if !p.balloonInstalled { | ||
| return nil |
There was a problem hiding this comment.
balloonInstalled never set on resume path breaks DrainBalloon
High Severity
DrainBalloon checks p.balloonInstalled and returns nil if false, but balloonInstalled is only set to true in the Create path (line 450). The Resume path never sets it, even though resumed VMs inherit the balloon device from the snapshot. This means DrainBalloon is a permanent no-op for all resumed sandboxes — which is the primary use case for the FPH drain feature (live sandbox pause via the server, template layer builds via ResumeSandbox, and the resume-build CLI tool).
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 3fe4149. Configure here.
Adds a pre-pause guest reclaim step (sync + drop_caches + compact_memory + fstrim, run via the existing envd Process service with Connect-Timeout-Ms) and a virtio-balloon free-page-hinting drain to MADV_DONTNEED freed pages out of the memfile before the snapshot. The balloon is installed with FPH=true whenever FPR is on; both behaviors are off by default and gated by separate LD flags (free-page-hinting, reclaim-on-pause), so they can be flipped at runtime without rebuilding templates.
3fe4149 to
7efc0d9
Compare
|
Superseded by the splits — closing.
|


Adds a pre-pause guest reclaim step (
sync+drop_caches+compact_memory+fstrim, run via the existing envd Process service withConnect-Timeout-Ms) and a virtio-balloon free-page-hinting drain thatMADV_DONTNEEDs the freed pages out of the memfile before the snapshot.The balloon is installed with FPH armed whenever FPR is on; both behaviors are off by default and gated at runtime by separate LD flags (
free-page-hinting,reclaim-on-pause), so they can be flipped without rebuilding templates.Pause order:
bestEffortReclaim(guest) →DrainBalloon(host-initiated FPH) →Pause→Snapshot. Reclaim is best-effort; onConnect-Timeout-Msenvd kills bash, the in-flight kernel write finishes, remaining steps are skipped. FPH drain has its own ~1.5s ceiling and is non-fatal — failures fall through to pause.Depends on #2541 → #2545 → #2520.