Add Trusted Server audit command#800
Conversation
7fdbff6 to
2ea46f1
Compare
0afc3ee to
79911d6
Compare
2ea46f1 to
edb7f0c
Compare
79911d6 to
bdb9284
Compare
prk-Jr
left a comment
There was a problem hiding this comment.
Summary
Adds ts audit: loads a public page in a fresh headless Chrome/Chromium session, inventories rendered JS assets, detects known integrations, and writes a JS-asset report plus a draft trusted-server.toml. Clean AuditCollector trait makes the analysis browser-free and well-tested. One blocking item: the new scraper dependency breaks the integration dependency-parity CI gate.
Blocking
🔧 wrench
- Dependency-parity CI failure —
scraper = "0.24.0"added to the workspace conflicts with integration-tests' pinnedscraper = "0.21"; theprepare integration artifactsjob fails on markup5ever / match_token / scraper / selectors (plus uuid, web-sys, wasm-bindgen-futures). Align the versions or extend the allowlist — see inline onCargo.toml.
Non-blocking
🤔 thinking
- Inline substring integration detection is false-positive prone —
analyzer.rs:217; auto-enables gpt/didomi/datadome in the draft. - First-party classifier over-matches parent/eTLD domains —
analyzer.rs:169. - No overall navigation timeout —
browser_collector.rs:98; a hanging origin stallsts audit.
♻️ refactor
report_erroris a misleading no-op wrapper —error.rs:7; overlapscli_error.
⛏ nitpick
- Asymmetric URL resolution —
analyzer.rs:68; relativesrcfrom collector script tags is silently dropped.
👍 praise
AuditCollectortrait → browser-free unit tests viaFakeCollector; strong testability.- Browser hygiene: fresh temp user-data-dir per run,
close()always called, handler task aborted on close failure, no forced--no-sandbox. parse_audit_urlrestricts to http/https (blocks file://, data:, chrome://), with a test.- Overwrite protection + a pre-collection conflict check, so a refusal doesn't even launch Chrome.
- GTM container id constrained by
GTM-[A-Z0-9]+→ no TOML injection from page content into the draft. - audit module and all host-only deps correctly gated behind
cfg(not(target_arch = "wasm32")).
CI Status
- prepare integration artifacts (dependency parity): FAIL (caused by this PR)
- integration / edgezero / browser tests: SKIPPED (blocked by the failed prepare job)
- fmt / clippy / cargo test / vitest: not run — those workflows trigger only on PRs targeting
main, and this PR targetsfeature/ts-cli-base. Author reportscargo test --package trusted-server-clipassing (42 tests) locally.
| mime = "0.3" | ||
| rand = "0.8" | ||
| regex = "1.12.3" | ||
| scraper = "0.24.0" |
There was a problem hiding this comment.
🔧 wrench — This scraper = "0.24.0" (plus chromiumoxide) is the direct cause of the red prepare integration artifacts CI job. The workspace now resolves markup5ever 0.35 / selectors 0.31 / match_token 0.35, but crates/trusted-server-integration-tests pins scraper = "0.21" (markup5ever 0.14 / selectors 0.26), so the transitive-parity check fails (also flags uuid, web-sys, wasm-bindgen-futures).
Must be green before merge. Options:
- Preferred: align scraper across the workspace and integration-tests (bump integration-tests to
0.24, or pin audit to0.21) so the trees converge. - If the split is intentional: add
markup5ever,match_token,scraper,selectorstotransitive_parity_allowlistinscripts/check-integration-dependency-versions.sh, and run the suggestedcargo update --manifest-path crates/trusted-server-integration-tests/Cargo.toml -p <crate> --precise <v>for the patch-level drifts (uuid, web-sys, wasm-bindgen-futures).
| } | ||
|
|
||
| let lowered = script.to_ascii_lowercase(); | ||
| for integration in ["gpt", "didomi", "datadome", "permutive", "lockr", "prebid"] { |
There was a problem hiding this comment.
🤔 thinking — Lowercased substring matching is false-positive prone, especially "gpt" (3 chars) which can appear in unrelated script text. Hits for gpt/didomi/datadome later auto-set enabled = true in the draft config (audit.rs build_draft_config). Blast radius is low (the draft is reviewed and must pass ts config validate), but a word-boundary or more specific marker per integration would cut noise.
| } | ||
| } | ||
|
|
||
| fn host_matches(page_host: &str, asset_host: &str) -> bool { |
There was a problem hiding this comment.
🤔 thinking — The page_host.strip_suffix(asset_host) direction classifies assets served from a parent or public-suffix host as first-party — e.g. page foo.co.uk with an asset on co.uk resolves to FirstParty. Acceptable as a heuristic for an advisory tool, but a public-suffix-aware check (psl/publicsuffix) would avoid treating eTLD/parent domains as same-party.
| report_error(format!("failed to create browser page for audit: {error}")) | ||
| })?; | ||
|
|
||
| page.goto(target_url.as_str()) |
There was a problem hiding this comment.
🤔 thinking — wait_for_page_settle caps at 6s, but goto / wait_for_navigation_response (and browser.close()) have no timeout. A slow or hanging origin can stall ts audit indefinitely. Consider wrapping navigation in tokio::time::timeout so the command fails cleanly with a partial-result warning instead of hanging.
| Err(message.into()) | ||
| } | ||
|
|
||
| pub(crate) fn report_error(message: impl Into<String>) -> String { |
There was a problem hiding this comment.
♻️ refactor — report_error returns its input unchanged with no logging or other side effect, so the name overpromises and it overlaps cli_error. Either give it real behavior (e.g. log::error! before returning) or drop it and construct the String / use cli_error directly.
|
|
||
| for tag in &collected.script_tags { | ||
| if let Some(src) = &tag.src { | ||
| if let Ok(asset_url) = Url::parse(src) { |
There was a problem hiding this comment.
⛏ nitpick — DOM scripts above use final_url.join(src) (relative-aware) while collector script_tags here use Url::parse(src) (absolute-only). A relative src coming from script_tags is silently dropped with no warning, unlike the DOM path which warns on unresolvable URLs. Consider final_url.join here too for symmetry.
Summary
ts auditas a Trusted Server-specific page audit command on top of the base CLI.Changes
crates/trusted-server-cli/src/audit.rscrates/trusted-server-cli/src/audit/analyzer.rscrates/trusted-server-cli/src/audit/browser_collector.rscrates/trusted-server-cli/src/audit/collector.rscrates/trusted-server-cli/src/args.rs,src/run.rs,src/lib.rsts auditinto CLI parsing and dispatch.Cargo.toml,crates/trusted-server-cli/Cargo.toml,Cargo.lockREADME.md,docs/guide/cli.md,docs/guide/getting-started.mddocs/superpowers/...audit...Closes
No issue provided; this PR is split out from the combined
feature/ts-cli-nextbranch.Test plan
cargo test --workspacecargo clippy --workspace --all-targets --all-features -- -D warningscargo fmt --all -- --checkcd crates/js/lib && npx vitest runcd crates/js/lib && npm run formatcd docs && npm run formatcargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1fastly compute servecargo test --package trusted-server-cli --target aarch64-apple-darwin— 42 passedChecklist
unwrap()in production code — useexpect("should ...")println!)