Skip to content

fix(sec): seal merger-proxy + redemption + transactional SPAC deal recompute#169

Open
sroussey wants to merge 2 commits into
claude/wonderful-hypatia-y6cb4lfrom
claude/wonderful-hypatia-bko1yr
Open

fix(sec): seal merger-proxy + redemption + transactional SPAC deal recompute#169
sroussey wants to merge 2 commits into
claude/wonderful-hypatia-y6cb4lfrom
claude/wonderful-hypatia-bko1yr

Conversation

@sroussey

Copy link
Copy Markdown
Contributor

Summary

Four HIGH-priority findings from an automated security/correctness review of sec for the last 24h, scoped to extractors that #165 / #166 left half-wired.

Stacked on PR #165 (claude/wonderful-hypatia-y6cb4l). The seal helpers (verifyRowSpan, boundSourceSpan, multi-stage wrapUntrusted defang) are inherited from that PR; this PR extends them to the two missed extractors and closes a defang gap. Retarget to main after #165 merges.

  • HIGH-1 (security) — merger-proxy + redemption extractors bypassed fix: six HIGH-priority hardening fixes (prompt-injection seal + 8-K storage + XML entity expansion) #165's prompt-injection seal. Filer-controlled DEFM14A prospectus / 8-K narrative could ship unbounded source_span through SpacMergerExtractionRepo / SpacRedemptionExtractionRepo.
  • HIGH-2 (security, defense-in-depth) — wrapUntrusted defang missed </UNTRUSTED&Tab;FILER&Tab;DOCUMENT>-style tokens because &Tab; wasn't in the named-entity table and &/; broke the tag-scan regex. The per-call nonce on the real fence still holds — this closes the layered gap.
  • HIGH-3 (DX/observability) — processMergerProxy never wrote to extractor_runs. sec version coverage extractor merger-proxy read zero; drop-previous was a no-op.
  • HIGH-4 (correctness) — SpacReportWriter.recomputeAndSaveDeals deleted orphan deal rows then wrote new rows non-atomically. A crash mid-pass corrupted the SPAC report row.

Two commits, scoped per concern pair:

  1. fix(forms): apply prompt-injection seal to merger-proxy + redemption extractors (HIGH-1 + HIGH-2)
  2. fix(spac): record extractor_runs for merger-proxy + transactional recompute (HIGH-3 + HIGH-4)

Test plan

  • bun test src/sec/forms/registration-statements/s1/
  • bun test src/sec/forms/proxies-information-statements/
  • bun test src/sec/forms/miscellaneous-filings/
  • bun test src/storage/spac/
  • bun test src/storage/form-8k-event/
  • bun run build

Generated by Claude Code

claude added 2 commits June 25, 2026 08:33
…extractors

The merger-proxy and redemption extractors that landed via PR #166 missed
the new prompt-injection seal helpers introduced in PR #165. The seal — raw-
byte verifyRowSpan at gate, boundSourceSpan at persist — is now applied to
both extractors so an unbounded source_span can no longer ship through
SpacMergerExtractionRepo / SpacRedemptionExtractionRepo via filer-controlled
DEFM14A or post-vote 8-K narrative.

Also widen the fence defang to neutralize the </UNTRUSTED&Tab;FILER&Tab;
DOCUMENT> family of bypasses: add whitespace named entities (Tab, NewLine,
nbsp, ensp, emsp, thinsp, zwsp, zwnj, zwj) to NAMED_ENTITY_TABLE and collapse
numeric whitespace entities (&#9; / &#x20; etc.) to a single space before the
TAG_SHAPED scan. The per-call 64-bit nonce on the real fence remains the
primary defense; this closes the layered defang gap.

No extractor version bumps: prompt is unchanged in non-adversarial inputs,
the gate change is normalization-only.
…ompute

Two SPAC correctness issues:

1. processMergerProxy never wrote to extractor_runs. The outer
   ProcessAccessionDocFormTask records a run for the form's extractor id
   (DEFM14A), but the merger-proxy nested extractor id was uncovered, so
   `sec version coverage extractor merger-proxy` always read zero and
   `drop-previous` was a no-op. Mirrors the redemption recordRun pattern
   from PR #168: success at the end, PARSE_ERROR in the segmenter catch,
   PROVIDER_ERROR around runSection.
2. SpacReportWriter.recomputeAndSaveDeals deleted orphan deal rows then
   wrote new deals in a non-atomic loop. A crash, AbortSignal, or DB error
   between the delete and the final saveDeal corrupted the SPAC report
   row. New SpacDealReplace helper wraps the delete+upsert pass in a real
   transaction: better-sqlite3 `db.transaction` for SQLite, BEGIN/COMMIT/
   ROLLBACK on a checked-out PG client. In-memory fallback retains the
   sequential semantics (no concurrency in tests).

No extractor version bump: merger-proxy stays at 1.0.0; `coverage` will
simply start populating an empty table.

Copy link
Copy Markdown
Contributor Author

Review findings (HIGH) still open on this PR

After review of the diff against PR #165, three HIGH and one MEDIUM finding remain. None are in this PR's stated scope per the title; flagging so they can be addressed here or in a stacked follow-up.

H-1 (open): redemption extractor still does NOT record extractor_runs

This PR's title says "merger-proxy + redemption", but the diff only wires recordRun for processMergerProxy. src/sec/forms/miscellaneous-filings/redemption8k.ts has no ExtractorRunRepo import. sec version coverage extractor redemption still reads 0, and drop-previous remains a no-op. Mirror the Form_DEFM14A.storage.ts:99-118 recordMergerProxyRun pattern in redemption8k.ts for the success / PARSE_ERROR / thrown-from-runSection paths, and add a redemption8k.runs.test.ts mirror of Form_DEFM14A.runs.test.ts.

H-2 (open): Postgres recomputeSpacDeals is atomic but not serialized

replacePostgres in src/storage/spac/SpacDealReplace.ts wraps DELETE-then-INSERT in BEGIN/COMMIT (good — fixes torn-write), but at READ COMMITTED isolation two concurrent merger-proxy/redemption ingests for the same CIK can both read the same existingDeals snapshot and both delete the same orphan set, racing on the final deal_index series. Add SELECT pg_advisory_xact_lock($1::int, $2::int) immediately after BEGIN, with a 32-bit namespace constant and cik as the key. SQLite is single-writer so unaffected. Add a PG-mode describe.skipIf(...) concurrency test mirroring the SQLite rollback test.

M-1 (open): defang misses stacked zero-width entities not in \s

numericCollapsed in sectionExtractors.ts:163-167 only collapses numeric entities whose decoded codepoint matches /\s/. ZWSP / ZWNJ / ZWJ / WJ / soft hyphen aren't \s and survive into the final wrapped prompt. The per-call nonce is the real backstop (so this is defense-in-depth only, not exploitable), but broaden the callback to also strip the zero-width / bidi format-char set that stripFormatChars already strips: when cp is in [], return "" instead of leaving the entity literal. Add an injection test feeding stacked &amp;amp;amp;amp;amp;#x200B; past the 4-iteration decode cap.

H-3 (separate, PR #165 carry-over): version bumps still missing for merger-proxy / redemption

PR #165 bumped S-1 1.2.0→1.3.0 and 424 1.1.0→1.2.0 on the rationale "prompt-shape change drifts confidence calibration; treat as fresh dev cycle." Both extractMergerDeal and extractRedemption ship the same prompt-shape change (nonced fence, multi-stage defang) but Form_DEFM14A.storage.ts:33 and redemption8k.ts:29 still set DEFAULT_EXTRACTOR_VERSION = "1.0.0". Bump both to 1.1.0 and add the operator startDev/promote ceremony for both ids to the PR body.

Happy to open a stacked PR on top of this branch implementing all four if preferred — let me know.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants