Skip to content

Add Trusted Server audit command#800

Open
ChristianPavilonis wants to merge 1 commit into
feature/ts-cli-basefrom
feature/ts-cli-audit
Open

Add Trusted Server audit command#800
ChristianPavilonis wants to merge 1 commit into
feature/ts-cli-basefrom
feature/ts-cli-audit

Conversation

@ChristianPavilonis

Copy link
Copy Markdown
Collaborator

Summary

  • Adds ts audit as a Trusted Server-specific page audit command on top of the base CLI.
  • Collects rendered page/script evidence with Chrome/Chromium and writes draft audit/config artifacts.
  • Adds audit-specific docs, specs, tests, and host dependencies.

Changes

File Change
crates/trusted-server-cli/src/audit.rs Adds audit orchestration, output planning, artifact writing, and draft config generation.
crates/trusted-server-cli/src/audit/analyzer.rs Detects JS assets, first/third-party classification, redirects, titles, and known integrations.
crates/trusted-server-cli/src/audit/browser_collector.rs Adds headless Chrome/Chromium collection for DOM scripts and network requests.
crates/trusted-server-cli/src/audit/collector.rs Defines audit collection traits/data shapes for testability.
crates/trusted-server-cli/src/args.rs, src/run.rs, src/lib.rs Wires ts audit into CLI parsing and dispatch.
Cargo.toml, crates/trusted-server-cli/Cargo.toml, Cargo.lock Adds audit-only dependencies.
README.md, docs/guide/cli.md, docs/guide/getting-started.md Documents audit workflow and Chrome/Chromium prerequisite.
docs/superpowers/...audit... Adds audit design and implementation plan artifacts.

Closes

No issue provided; this PR is split out from the combined feature/ts-cli-next branch.

Test plan

  • cargo test --workspace
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • JS tests: cd crates/js/lib && npx vitest run
  • JS format: cd crates/js/lib && npm run format
  • Docs format: cd docs && npm run format
  • WASM build: cargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1
  • Manual testing via fastly compute serve
  • Other: cargo test --package trusted-server-cli --target aarch64-apple-darwin — 42 passed

Checklist

  • Changes follow CLAUDE.md conventions
  • No unwrap() in production code — use expect("should ...")
  • Uses tracing/log macros (not println!)
  • New code has tests
  • No secrets or credentials committed

@prk-Jr prk-Jr left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Adds ts audit: loads a public page in a fresh headless Chrome/Chromium session, inventories rendered JS assets, detects known integrations, and writes a JS-asset report plus a draft trusted-server.toml. Clean AuditCollector trait makes the analysis browser-free and well-tested. One blocking item: the new scraper dependency breaks the integration dependency-parity CI gate.

Blocking

🔧 wrench

  • Dependency-parity CI failurescraper = "0.24.0" added to the workspace conflicts with integration-tests' pinned scraper = "0.21"; the prepare integration artifacts job fails on markup5ever / match_token / scraper / selectors (plus uuid, web-sys, wasm-bindgen-futures). Align the versions or extend the allowlist — see inline on Cargo.toml.

Non-blocking

🤔 thinking

  • Inline substring integration detection is false-positive proneanalyzer.rs:217; auto-enables gpt/didomi/datadome in the draft.
  • First-party classifier over-matches parent/eTLD domainsanalyzer.rs:169.
  • No overall navigation timeoutbrowser_collector.rs:98; a hanging origin stalls ts audit.

♻️ refactor

  • report_error is a misleading no-op wrappererror.rs:7; overlaps cli_error.

⛏ nitpick

  • Asymmetric URL resolutionanalyzer.rs:68; relative src from collector script tags is silently dropped.

👍 praise

  • AuditCollector trait → browser-free unit tests via FakeCollector; strong testability.
  • Browser hygiene: fresh temp user-data-dir per run, close() always called, handler task aborted on close failure, no forced --no-sandbox.
  • parse_audit_url restricts to http/https (blocks file://, data:, chrome://), with a test.
  • Overwrite protection + a pre-collection conflict check, so a refusal doesn't even launch Chrome.
  • GTM container id constrained by GTM-[A-Z0-9]+ → no TOML injection from page content into the draft.
  • audit module and all host-only deps correctly gated behind cfg(not(target_arch = "wasm32")).

CI Status

  • prepare integration artifacts (dependency parity): FAIL (caused by this PR)
  • integration / edgezero / browser tests: SKIPPED (blocked by the failed prepare job)
  • fmt / clippy / cargo test / vitest: not run — those workflows trigger only on PRs targeting main, and this PR targets feature/ts-cli-base. Author reports cargo test --package trusted-server-cli passing (42 tests) locally.

Comment thread Cargo.toml
mime = "0.3"
rand = "0.8"
regex = "1.12.3"
scraper = "0.24.0"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — This scraper = "0.24.0" (plus chromiumoxide) is the direct cause of the red prepare integration artifacts CI job. The workspace now resolves markup5ever 0.35 / selectors 0.31 / match_token 0.35, but crates/trusted-server-integration-tests pins scraper = "0.21" (markup5ever 0.14 / selectors 0.26), so the transitive-parity check fails (also flags uuid, web-sys, wasm-bindgen-futures).

Must be green before merge. Options:

  • Preferred: align scraper across the workspace and integration-tests (bump integration-tests to 0.24, or pin audit to 0.21) so the trees converge.
  • If the split is intentional: add markup5ever, match_token, scraper, selectors to transitive_parity_allowlist in scripts/check-integration-dependency-versions.sh, and run the suggested cargo update --manifest-path crates/trusted-server-integration-tests/Cargo.toml -p <crate> --precise <v> for the patch-level drifts (uuid, web-sys, wasm-bindgen-futures).

}

let lowered = script.to_ascii_lowercase();
for integration in ["gpt", "didomi", "datadome", "permutive", "lockr", "prebid"] {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — Lowercased substring matching is false-positive prone, especially "gpt" (3 chars) which can appear in unrelated script text. Hits for gpt/didomi/datadome later auto-set enabled = true in the draft config (audit.rs build_draft_config). Blast radius is low (the draft is reviewed and must pass ts config validate), but a word-boundary or more specific marker per integration would cut noise.

}
}

fn host_matches(page_host: &str, asset_host: &str) -> bool {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — The page_host.strip_suffix(asset_host) direction classifies assets served from a parent or public-suffix host as first-party — e.g. page foo.co.uk with an asset on co.uk resolves to FirstParty. Acceptable as a heuristic for an advisory tool, but a public-suffix-aware check (psl/publicsuffix) would avoid treating eTLD/parent domains as same-party.

report_error(format!("failed to create browser page for audit: {error}"))
})?;

page.goto(target_url.as_str())

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinkingwait_for_page_settle caps at 6s, but goto / wait_for_navigation_response (and browser.close()) have no timeout. A slow or hanging origin can stall ts audit indefinitely. Consider wrapping navigation in tokio::time::timeout so the command fails cleanly with a partial-result warning instead of hanging.

Err(message.into())
}

pub(crate) fn report_error(message: impl Into<String>) -> String {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ refactorreport_error returns its input unchanged with no logging or other side effect, so the name overpromises and it overlaps cli_error. Either give it real behavior (e.g. log::error! before returning) or drop it and construct the String / use cli_error directly.


for tag in &collected.script_tags {
if let Some(src) = &tag.src {
if let Ok(asset_url) = Url::parse(src) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick — DOM scripts above use final_url.join(src) (relative-aware) while collector script_tags here use Url::parse(src) (absolute-only). A relative src coming from script_tags is silently dropped with no warning, unlike the DOM path which warns on unresolvable URLs. Consider final_url.join here too for symmetry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants