fix(sec): seal merger-proxy + redemption + transactional SPAC deal recompute#169
fix(sec): seal merger-proxy + redemption + transactional SPAC deal recompute#169sroussey wants to merge 2 commits into
Conversation
…extractors The merger-proxy and redemption extractors that landed via PR #166 missed the new prompt-injection seal helpers introduced in PR #165. The seal — raw- byte verifyRowSpan at gate, boundSourceSpan at persist — is now applied to both extractors so an unbounded source_span can no longer ship through SpacMergerExtractionRepo / SpacRedemptionExtractionRepo via filer-controlled DEFM14A or post-vote 8-K narrative. Also widen the fence defang to neutralize the </UNTRUSTED	FILER	 DOCUMENT> family of bypasses: add whitespace named entities (Tab, NewLine, nbsp, ensp, emsp, thinsp, zwsp, zwnj, zwj) to NAMED_ENTITY_TABLE and collapse numeric whitespace entities (	 /   etc.) to a single space before the TAG_SHAPED scan. The per-call 64-bit nonce on the real fence remains the primary defense; this closes the layered defang gap. No extractor version bumps: prompt is unchanged in non-adversarial inputs, the gate change is normalization-only.
…ompute Two SPAC correctness issues: 1. processMergerProxy never wrote to extractor_runs. The outer ProcessAccessionDocFormTask records a run for the form's extractor id (DEFM14A), but the merger-proxy nested extractor id was uncovered, so `sec version coverage extractor merger-proxy` always read zero and `drop-previous` was a no-op. Mirrors the redemption recordRun pattern from PR #168: success at the end, PARSE_ERROR in the segmenter catch, PROVIDER_ERROR around runSection. 2. SpacReportWriter.recomputeAndSaveDeals deleted orphan deal rows then wrote new deals in a non-atomic loop. A crash, AbortSignal, or DB error between the delete and the final saveDeal corrupted the SPAC report row. New SpacDealReplace helper wraps the delete+upsert pass in a real transaction: better-sqlite3 `db.transaction` for SQLite, BEGIN/COMMIT/ ROLLBACK on a checked-out PG client. In-memory fallback retains the sequential semantics (no concurrency in tests). No extractor version bump: merger-proxy stays at 1.0.0; `coverage` will simply start populating an empty table.
Review findings (HIGH) still open on this PRAfter review of the diff against PR #165, three HIGH and one MEDIUM finding remain. None are in this PR's stated scope per the title; flagging so they can be addressed here or in a stacked follow-up. H-1 (open): redemption extractor still does NOT record
|
Summary
Four HIGH-priority findings from an automated security/correctness review of
secfor the last 24h, scoped to extractors that #165 / #166 left half-wired.Stacked on PR #165 (
claude/wonderful-hypatia-y6cb4l). The seal helpers (verifyRowSpan,boundSourceSpan, multi-stagewrapUntrusteddefang) are inherited from that PR; this PR extends them to the two missed extractors and closes a defang gap. Retarget tomainafter #165 merges.source_spanthroughSpacMergerExtractionRepo/SpacRedemptionExtractionRepo.wrapUntrusteddefang missed</UNTRUSTED	FILER	DOCUMENT>-style tokens because	wasn't in the named-entity table and&/;broke the tag-scan regex. The per-call nonce on the real fence still holds — this closes the layered gap.processMergerProxynever wrote toextractor_runs.sec version coverage extractor merger-proxyread zero;drop-previouswas a no-op.SpacReportWriter.recomputeAndSaveDealsdeleted orphan deal rows then wrote new rows non-atomically. A crash mid-pass corrupted the SPAC report row.Two commits, scoped per concern pair:
fix(forms): apply prompt-injection seal to merger-proxy + redemption extractors(HIGH-1 + HIGH-2)fix(spac): record extractor_runs for merger-proxy + transactional recompute(HIGH-3 + HIGH-4)Test plan
bun test src/sec/forms/registration-statements/s1/bun test src/sec/forms/proxies-information-statements/bun test src/sec/forms/miscellaneous-filings/bun test src/storage/spac/bun test src/storage/form-8k-event/bun run buildGenerated by Claude Code