perf: reduce container startup idle wait by ~5s (#1851)

Copilot · web-flow · commit 0be647fd7f1b · 2026-04-09T14:55:15.000-07:00
* Initial plan * perf: reduce container startup idle wait times - Replace unconditional `sleep 5` with poll-based wait (up to 1s, 100ms intervals) in both chroot and non-chroot paths of entrypoint.sh. Fast commands exit early without paying the full wait. Saves ~4s on every invocation. - Reduce DoH proxy healthcheck start_period from 10s to 2s and interval from 5s to 1s, matching the Squid proxy timings. - Reduce Squid log flush delay from 500ms to 200ms. - Update documentation to reflect the new timing. Agent-Logs-Url: https://github.com/github/gh-aw-firewall/sessions/564848fc-f657-47bf-94b2-86ed2b542f0b --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
diff --git a/containers/agent/entrypoint.sh b/containers/agent/entrypoint.sh
@@ -810,8 +810,13 @@ AWFEOF
   " &
   AGENT_PID=$!
 
-  # Wait for agent to initialize and cache tokens (5 seconds)
-  sleep 5
+  # Wait for agent to initialize and cache tokens (up to 1 second)
+  # The one-shot-token LD_PRELOAD library caches tokens in ~100ms during process init.
+  # Poll every 100ms so fast commands (e.g. 'echo ok') don't pay the full wait.
+  for _i in 1 2 3 4 5 6 7 8 9 10; do
+    kill -0 "$AGENT_PID" 2>/dev/null || break
+    sleep 0.1
+  done
 
   # Unset all sensitive tokens from parent shell environment
   unset_sensitive_tokens
@@ -872,8 +877,13 @@ else
   fi
   AGENT_PID=$!
 
-  # Wait for agent to initialize and cache tokens (5 seconds)
-  sleep 5
+  # Wait for agent to initialize and cache tokens (up to 1 second)
+  # The one-shot-token LD_PRELOAD library caches tokens in ~100ms during process init.
+  # Poll every 100ms so fast commands (e.g. 'echo ok') don't pay the full wait.
+  for _i in 1 2 3 4 5 6 7 8 9 10; do
+    kill -0 "$AGENT_PID" 2>/dev/null || break
+    sleep 0.1
+  done
 
   # Unset all sensitive tokens from parent shell environment
   unset_sensitive_tokens
diff --git a/docs/authentication-architecture.md b/docs/authentication-architecture.md
@@ -313,7 +313,7 @@ char* getenv(const char* name) {
 
 **Source:** `containers/agent/entrypoint.sh`
 
-The entrypoint (PID 1) runs the agent command in the background, then unsets sensitive tokens from its own environment after a 5-second grace period:
+The entrypoint (PID 1) runs the agent command in the background, then unsets sensitive tokens from its own environment after a brief grace period (up to 1 second, polling every 100ms):
 
 ```bash
 unset_sensitive_tokens() {
@@ -336,7 +336,11 @@ unset_sensitive_tokens() {
 # Run agent in background, wait for it to cache tokens, then unset
 capsh --drop=cap_net_admin -- -c "exec gosu awfuser $COMMAND" &
 AGENT_PID=$!
-sleep 5
+# Poll every 100ms for up to 1s; exit early if agent finishes
+for _i in 1 2 3 4 5 6 7 8 9 10; do
+  kill -0 "$AGENT_PID" 2>/dev/null || break
+  sleep 0.1
+done
 unset_sensitive_tokens
 wait $AGENT_PID
 ```
diff --git a/docs/token-unsetting-fix.md b/docs/token-unsetting-fix.md
@@ -17,14 +17,14 @@ Modified the entrypoint to unset all sensitive tokens from its own environment a
 
 2. **Modified chroot mode execution** (entrypoint.sh:449-468)
    - Changed from `exec chroot ...` to `chroot ... &` (run in background)
-   - Added 5-second sleep to allow agent to initialize and cache tokens
+   - Added poll-based wait (up to 1 second, 100ms intervals) to allow agent to initialize and cache tokens
    - Call `unset_sensitive_tokens()` to clear tokens from parent shell
    - Use `wait $AGENT_PID` to wait for agent completion
    - Exit with agent's exit code
 
 3. **Modified non-chroot mode execution** (entrypoint.sh:484-499)
    - Changed from `exec capsh ...` to `capsh ... &` (run in background)
-   - Added 5-second sleep to allow agent to initialize and cache tokens
+   - Added poll-based wait (up to 1 second, 100ms intervals) to allow agent to initialize and cache tokens
    - Call `unset_sensitive_tokens()` to clear tokens from parent shell
    - Use `wait $AGENT_PID` to wait for agent completion
    - Exit with agent's exit code
@@ -47,15 +47,15 @@ The following tokens are unset from the entrypoint's environment:
 
 1. **t=0s**: Container starts, entrypoint receives tokens in environment
 2. **t=0s**: Entrypoint starts agent command in background
-3. **t=0-5s**: Agent initializes, reads tokens via getenv(), one-shot-token library caches them
-4. **t=5s**: Entrypoint calls `unset_sensitive_tokens()`, clearing tokens from `/proc/1/environ`
-5. **t=5s+**: Agent continues running with cached tokens, `/proc/1/environ` no longer contains tokens
+3. **t=0-1s**: Agent initializes, reads tokens via getenv(), one-shot-token library caches them
+4. **t=≤1s**: Entrypoint calls `unset_sensitive_tokens()`, clearing tokens from `/proc/1/environ`
+5. **t=≤1s+**: Agent continues running with cached tokens, `/proc/1/environ` no longer contains tokens
 6. **t=end**: Agent completes, entrypoint exits with agent's exit code
 
 ### Security Impact
 
 - **Before**: Tokens accessible via `/proc/1/environ` throughout agent execution
-- **After**: Tokens accessible via `/proc/1/environ` only for first 5 seconds, then cleared
+- **After**: Tokens accessible via `/proc/1/environ` only for up to 1 second, then cleared
 - **Agent behavior**: Unchanged - agent can still read tokens via getenv() (cached by one-shot-token library)
 
 ### Testing
@@ -75,6 +75,6 @@ Manual test script at `test-token-unset.sh`:
 
 ## Notes
 
-- The 5-second delay is necessary to give the agent process time to initialize and cache tokens via the one-shot-token library before the parent shell unsets them
+- The poll-based wait (up to 1 second, checking every 100ms) gives the agent process time to initialize and cache tokens via the one-shot-token library before the parent shell unsets them. Fast commands exit early without waiting the full second.
 - Both token lists (entrypoint.sh and one-shot-token library) must be kept in sync when adding new token types
 - The exit code handling is preserved - the entrypoint exits with the agent's exit code
diff --git a/src/docker-manager.ts b/src/docker-manager.ts
@@ -1658,10 +1658,10 @@ export function generateDockerCompose(
       command: ['proxy-dns', '--address', '0.0.0.0', '--port', '53', '--upstream', config.dnsOverHttps],
       healthcheck: {
         test: ['CMD', 'nslookup', '-port=53', 'cloudflare.com', '127.0.0.1'],
-        interval: '5s',
+        interval: '1s',
         timeout: '3s',
         retries: 5,
-        start_period: '10s',
+        start_period: '2s',
       },
       // Security hardening: Drop all capabilities
       cap_drop: ['ALL'],
@@ -2301,7 +2301,7 @@ export async function runAgentCommand(workDir: string, allowedDomains: string[],
     logger.debug(`Agent exit code: ${exitCode}`);
 
     // Small delay to ensure Squid logs are flushed to disk
-    await new Promise(resolve => setTimeout(resolve, 500));
+    await new Promise(resolve => setTimeout(resolve, 200));
 
     // Check Squid logs to see if any domains were blocked (do this BEFORE cleanup)
     const { hasDenials, blockedTargets } = await checkSquidLogs(workDir, proxyLogsDir);