Skip to content

fix(forms): restore predefined XML entity decoding without re-opening XXE#171

Open
sroussey wants to merge 1 commit into
claude/wonderful-hypatia-y6cb4lfrom
claude/wonderful-hypatia-y6cb4l-entity-fix
Open

fix(forms): restore predefined XML entity decoding without re-opening XXE#171
sroussey wants to merge 1 commit into
claude/wonderful-hypatia-y6cb4lfrom
claude/wonderful-hypatia-y6cb4l-entity-fix

Conversation

@sroussey

Copy link
Copy Markdown
Contributor

Summary

PR #165 disabled entity processing entirely (processEntities: false) to defang billion-laughs payloads. That also silently corrupted every XML form value carrying one of the five predefined XML entities — e.g. a filer name like Mac Accounting Group & CPAs, LLP was persisting as the literal four-character string & instead of the intended &. Any field containing &, <, >, ", or ' was affected.

This restores the standard predefined-entity decode (so all five round-trip normally) while keeping XXE / billion-laughs defenses in place.

Fix

Two layers of defense replace the heavy-handed processEntities: false:

  1. Bounded processEntities config. fast-xml-parser 5.8.0+ supports
    { enabled, maxEntityCount, maxExpansionDepth, maxExpandedLength, maxTotalExpansions }.
    A filer-declared chain bombs out at the configured limit instead of expanding geometrically.

  2. DOCTYPE strip. A new stripDoctype(xml) pass removes any leading <!DOCTYPE name [ ... ]> block before parsing — including the bracketed internal subset where <!ENTITY ...> declarations live — so filer-declared custom entities never reach the parser at all.

getParser() now returns a thin wrapper exposing parse(xml) that runs stripDoctype() before delegating to the underlying XMLParser, so every existing callsite (Form_D, Form_C, Form_1_A, Form_1_K, Form_1_Z, Form_8_K, Form_3/4/5, Form_144, Form_CFPORTAL) picks up the fix without code changes.

Test plan

  • Existing billion-laughs DOCTYPE test still parses in under 50 ms (no expansion).
  • Mac Accounting Group &amp;amp; CPAs, LLP in a Form D <entityName> round-trips through one decode step to Mac Accounting Group &amp; CPAs, LLP.
  • <!DOCTYPE foo [ <!ENTITY xxe "PWNED"> ]> is stripped, so &xxe; in the body parses as the literal &xxe; (NOT PWNED).
  • All five predefined entities (&amp; &lt; &gt; &quot; &apos;) decode to & < > " '.
  • bun test src/sec/forms/ — 423 tests pass across 56 files.
  • bun run build — clean.

Generated by Claude Code

…ities + DOCTYPE strip

PR #165 disabled entity processing entirely (processEntities: false) to defang
billion-laughs payloads. That also silently corrupted every XML form value
carrying one of the five predefined XML entities — e.g.
`Mac Accounting Group &amp; CPAs, LLP` was persisting as the literal
four-character string `&amp;` instead of the intended `&`.

This restores the standard predefined-entity decode (so `&amp;`, `&lt;`,
`&gt;`, `&quot;`, `&apos;` round-trip normally) while keeping XXE / billion-
laughs defenses in place:

  - fast-xml-parser's bounded processEntities config caps entity count,
    expansion depth, total expansions, and expanded length — a filer-declared
    chain bombs out at the limit instead of expanding geometrically.
  - A new stripDoctype() pass removes any leading <!DOCTYPE name [...]>
    block before parsing, so filer-declared entities never reach the parser
    at all.

getParser() now returns a thin wrapper that runs stripDoctype() before
parse(), keeping all callsites unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants