Skip to content

fix: make managed-agent spawn and teardown portable to Windows#1097

Merged
wpfleger96 merged 7 commits into
mainfrom
duncan/windows-build-portability
Jun 17, 2026
Merged

fix: make managed-agent spawn and teardown portable to Windows#1097
wpfleger96 merged 7 commits into
mainfrom
duncan/windows-build-portability

Conversation

@wpfleger96

Copy link
Copy Markdown
Collaborator

Several managed-agent spawn, teardown, and shim paths were #[cfg(unix)]-only and silently no-op (or returned a falsey stub) on Windows, breaking the desktop build four ways. All four fixes share one Windows-portability theme.

#2 / #4 — MCP PermissionDenied on C:\Windows

buzz-agent spawns the MCP with cmd.env_clear() then re-adds only an allowlist (PASSTHROUGH_ENV) that had TMPDIR but none of the Windows temp/profile vars. Stripped of TMP/TEMP/USERPROFILE, std::env::temp_dir() falls all the way back to C:\Windows, where Shim::install() can't create its tempdir → PermissionDenied (os error 5) → every MCP init dies.

  • crates/buzz-agent/src/mcp.rs: cfg-gated PASSTHROUGH_ENV_WINDOWS adds TMP, TEMP, USERPROFILE (load-bearing for temp_dir(); USERPROFILE is the always-set floor) plus LOCALAPPDATA, APPDATA (child-tool config — git, etc.).

The shim install path had two more Unix-isms that would silently break buzz/git shell-outs even after temp_dir() was fixed:

  • crates/buzz-dev-mcp/src/shim.rs: PATH was built with a hardcoded : separator — now std::env::split_paths/join_paths (platform separator). The #[cfg(not(unix))] multicall copy dropped the .exe extension PATHEXT needs to treat the file as runnable — now appended.
  • crates/buzz-dev-mcp/src/lib.rs: multicall dispatch matches on file_stem() instead of file_name(), so the .exe copies (rg.exe, buzz.exe, ...) route to the correct match arm.

#1 — stray buzz-acp.exe console + orphaned process tree

The buzz-acp child spawned with no CREATE_NO_WINDOW flag (a console window popped and lingered), and the non-unix stop path was Child::kill(), which kills only the harness and orphans the 24 agent workers + MCP servers it spawned.

  • desktop/src-tauri/src/managed_agents/process_lifecycle.rs (new, #[cfg(windows)]): a Win32 Job Object (JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE) owns the harness tree and reaps it when the handle drops — the Windows mirror of the Unix process_group(0) teardown. The after-restart path (PID-only, no handle) falls back to taskkill /T.
  • desktop/src-tauri/src/managed_agents/runtime.rs: CREATE_NO_WINDOW at the spawn site; stop_managed_agent_process drops the job handle on Windows (falls back to Child::kill() if assignment failed); terminate_process delegates to taskkill_tree; the augmented-PATH builder also moves off the hardcoded : separator.
  • desktop/src-tauri/src/managed_agents/types.rs: ManagedAgentProcess carries the #[cfg(windows)] job handle.

The Job Object dies with its owner, so a co-located sibling instance's agents are never affected — no env-read PEB walk or identity-matched sweep is needed.

#4agent_cmd=goose acp, "program not found"

With an empty managed-agents.json, a freshly-created agent fell through to the platform default goose, which isn't on PATH on a stock Windows install → all 24 workers failed with program not found.

  • desktop/src-tauri/src/managed_agents/discovery.rs: new default_agent_command() catalog-resolves the bundled buzz-agent — the same shape mesh_llm::preset already uses, so the default can't drift from the provider definition. buzz-agent takes no acp arg, so there's no arg leakage.
  • desktop/src-tauri/src/commands/agents.rs, types.rs: create path uses the resolver; the dead DEFAULT_AGENT_COMMAND = "goose" const is removed.

#3 — "Check for Updates" silently does nothing

v0.3.24 is the latest tag, so the build is genuinely up-to-date. But when the updater plugin is unavailable, the hook collapsed to idle, re-rendering the same "Check for Updates" button — indistinguishable from a no-op.

  • desktop/src/features/settings/hooks/use-updater.ts: the unavailable branch now console.warns (so the firing branch is diagnosable in the Windows app log) and sets a visible unavailable state instead of idle.
  • desktop/src/features/settings/UpdateChecker.tsx: renders a clear "Automatic updates aren't available on this build" row.

Verification

Verified on the macOS dev host: just desktop-check, just desktop-test (922 TS tests), just desktop-tauri-test (547 Rust tests incl. the new default_agent_command test and windows_passthrough_includes_temp_dir_vars), cargo clippy/cargo fmt clean across the touched workspace crates and the Tauri crate.

The #[cfg(windows)] Job Object, CREATE_NO_WINDOW, shim install, and the updater branch are not exercised by the host test suite and were not type-checked by a Windows compiler (no local MSVC toolchain). They were verified by inspecting windows-sys 0.61 symbol signatures. Windows CI / a manual run on Will's box is the real gate for those surfaces.

A separate minor noted but intentionally not addressed: devtools is unreachable on release Windows builds because windows_subsystem="windows" non-debug compiles the inspector out.

Comment thread .github/workflows/ci.yml Fixed
npub1mn7jgtj4w2pd0g0zeuhxsa6jy6p0rewxz4kujt98my82ahfmp72sxjexk7 and others added 5 commits June 17, 2026 17:48
Several spawn, teardown, and shim paths were #[cfg(unix)]-only and silently
no-op (or returned a falsey stub) on Windows, breaking the desktop build four
ways. All four are in the same Windows-portability theme.

#2/#4 (MCP PermissionDenied on C:\Windows): buzz-agent spawns the MCP with
env_clear() then re-adds only an allowlist that omitted the Windows temp/profile
vars. Stripped of TMP/TEMP/USERPROFILE, std::env::temp_dir() falls back to
C:\Windows and Shim::install() can't write there. Pass the Windows vars through
(cfg-gated). The shim itself had two more Unix-isms in the same install path:
the PATH separator was hardcoded ':' (now std::env::join_paths) and the
non-unix multicall copies dropped the .exe extension PATHEXT needs to exec them.
Multicall dispatch now matches on file_stem() so the .exe copies route correctly.

#1 (stray console + orphaned process tree): the buzz-acp child spawned with no
CREATE_NO_WINDOW (console popped) and the non-unix stop path was Child::kill(),
which kills only the harness and orphans the 24 workers + MCP servers. A Win32
Job Object with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE now owns the tree and reaps
it when the handle drops — the Windows mirror of the Unix process-group
teardown. The after-restart path (PID only, no handle) falls back to taskkill /T.
The Windows primitives live in a new process_lifecycle module.

#4 (program not found): the create-path default agent command was the bare
`goose`, not on PATH on a stock Windows install. It now catalog-resolves the
bundled `buzz-agent`, the same shape mesh_llm::preset already uses.

#3 (updater silently does nothing): when the updater plugin is unavailable the
hook collapsed to `idle`, re-rendering the same button — indistinguishable from
a no-op. It now sets a visible `unavailable` state and warns to the log so the
firing branch is diagnosable on Will's Windows build.

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
The doc comment claimed buzz-acp connects to the relay before spawning
its workers, making the spawn-to-assign window structurally empty. Source
contradicts this: in crates/buzz-acp/src/lib.rs the agent pool is built
(agent_pool_ready, line 1061) before the relay connect (line 1098), and
Will's Windows log confirms that order. The window is closed by
assign-latency (microsecond synchronous Win32 calls beating buzz-acp's
tens-of-ms startup), not by child ordering. Comment-only change.

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
The #[cfg(windows)] code paths (Job Object kill-tree, multicall shim,
MCP env passthrough) were never compiled by CI — no Linux job builds the
MSVC target, and aws-lc-sys needs windows.h, so they shipped verified only
by inspection. Add a windows-latest job that runs clippy, a workspace
cargo check, and the Tauri-crate check + test against
x86_64-pc-windows-msvc, gating exactly the Windows arms.

Uses dtolnay/rust-toolchain rather than hermit (which the Linux jobs use)
because hermit does not provide MSVC; mirrors release.yml's release-windows
toolchain. Sidecar stubs are created before any Tauri compile because Tauri
validates externalBin at compile time.

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
The new windows-rust CI job's Tauri-crate test step (--all-targets) was
the first compiler to build these tests against MSVC, and it failed with 10
E0433 errors: migration_tests.rs and migration_team_dir_tests.rs call
std::os::unix::fs::symlink directly with no cfg guard. Production code is
already correctly #[cfg(unix)] / #[cfg(not(unix))] gated, so the workspace
check and clippy passed — only the test compile reached these targets.

These tests assert Unix symlink semantics (create symlink, heal/replace it,
read through it); there is nothing to verify on Windows, where the
production path copies instead. Gate each symlink-using test plus the two
helpers they exclusively use (setup_sync_layout, sync_files) so the helpers
do not trip dead_code under -D warnings on Windows.

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
zizmor's superfluous-actions check flagged the dtolnay/rust-toolchain step: windows-latest preinstalls rustup, which honors the repo-root rust-toolchain.toml (channel 1.95.0, profile = default). That profile already provides clippy, and the runner's host triple is x86_64-pc-windows-msvc, so both the explicit toolchain install and the targets/components inputs were no-ops. release.yml's release-windows job keeps its own copy (out of scope, not flagged).

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
@wpfleger96 wpfleger96 force-pushed the duncan/windows-build-portability branch from 632e866 to 5690b2f Compare June 17, 2026 21:48
npub1mn7jgtj4w2pd0g0zeuhxsa6jy6p0rewxz4kujt98my82ahfmp72sxjexk7 and others added 2 commits June 17, 2026 18:03
The windows-rust rust-cache had no workspaces key, so it defaulted to
the repo root and never cached desktop/src-tauri's separate target dir.
The Check/Test (Tauri crate) steps are the heaviest compile on the job
and rebuilt cold every run. Mirror the Desktop E2E Relay pattern to
cache both workspaces, matching every other job that builds the crate.

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
reconcile_team_dirs_in_file built the rewritten path with a single
target_dir.join("agents/teams"). On Windows, join does not split the
embedded '/', so it persisted a mixed-separator path
(...app.dev\agents/teams\id) into managed-agents.json — unlike fresh
writes, which build the path per-component and stay all-native. Split
the join so reconcile emits the same native-separator path the rest of
the system stores. Tests now build expectations via shared team_dir /
pack_dir helpers using the same per-component join, so they assert real
production output on both Unix and Windows.

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
@wpfleger96 wpfleger96 merged commit 4201311 into main Jun 17, 2026
28 checks passed
@wpfleger96 wpfleger96 deleted the duncan/windows-build-portability branch June 17, 2026 22:49
tlongwell-block pushed a commit that referenced this pull request Jun 18, 2026
…te-response

* origin/main: (194 commits)
  Fold agent core memory into the session system prompt (#1112)
  feat(cli): add patches and issues commands for NIP-34 git collaboration (#1073)
  fix(desktop): stop random timeline message loss + page reconnect replay (#1105)
  Update README.md
  fix(desktop): keep thread replies from scrolling channel (#1109)
  fix(buzz-acp): accept siblings under allowlist author gate (#1108)
  feat(deploy): add production Helm chart for Buzz (#990)
  fix(desktop): keep MembersSidebar input usable while an add is in flight (#1106)
  chore(release): release version 0.3.25 (#1102)
  fix(desktop): stop dimming deferred message lists (#1104)
  Smooth channel loading: single-surface timeline state machine (#1099)
  feat: surface base + persona system prompts in observer feed (#1103)
  ci: move reminder e2e to a dedicated backend-integration job (#1098)
  fix: give agent-observer sub a replay-capable limit (#1100)
  fix: make managed-agent spawn and teardown portable to Windows (#1097)
  fix(desktop): constrain message timeline width with min-w-0 (#1092)
  feat(desktop): reminders notifications, snooze, overlay, and inbox view mode (#1093)
  feat(prompt): add memory hygiene and hoist universal engineering discipline to base prompt (#1085)
  fix(desktop): correct thread-unread badge flicker, stale clear, phantom count, mention gate, and nested count (#1080)
  Fix mention chip alignment (#1094)
  ...

# Conflicts:
#	crates/buzz-cli/src/commands/workflows.rs
tlongwell-block pushed a commit that referenced this pull request Jun 18, 2026
…te-response

* origin/main: (194 commits)
  Fold agent core memory into the session system prompt (#1112)
  feat(cli): add patches and issues commands for NIP-34 git collaboration (#1073)
  fix(desktop): stop random timeline message loss + page reconnect replay (#1105)
  Update README.md
  fix(desktop): keep thread replies from scrolling channel (#1109)
  fix(buzz-acp): accept siblings under allowlist author gate (#1108)
  feat(deploy): add production Helm chart for Buzz (#990)
  fix(desktop): keep MembersSidebar input usable while an add is in flight (#1106)
  chore(release): release version 0.3.25 (#1102)
  fix(desktop): stop dimming deferred message lists (#1104)
  Smooth channel loading: single-surface timeline state machine (#1099)
  feat: surface base + persona system prompts in observer feed (#1103)
  ci: move reminder e2e to a dedicated backend-integration job (#1098)
  fix: give agent-observer sub a replay-capable limit (#1100)
  fix: make managed-agent spawn and teardown portable to Windows (#1097)
  fix(desktop): constrain message timeline width with min-w-0 (#1092)
  feat(desktop): reminders notifications, snooze, overlay, and inbox view mode (#1093)
  feat(prompt): add memory hygiene and hoist universal engineering discipline to base prompt (#1085)
  fix(desktop): correct thread-unread badge flicker, stale clear, phantom count, mention gate, and nested count (#1080)
  Fix mention chip alignment (#1094)
  ...

Co-authored-by: Tyler Longwell <tlongwell@squareup.com>
Signed-off-by: Tyler Longwell <tlongwell@squareup.com>

# Conflicts:
#	crates/buzz-cli/src/commands/workflows.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants