When parsing PDFs with incomplete table borders (e.g., partial lines or missing cell boundaries), Table.getRow() crashes with:
TypeError: Cannot read properties of undefined (reading 'from')
at Table.getRow (src/pdf-parse/geometry/Table.ts:278)
at Table.toData (src/pdf-parse/geometry/Table.ts:312)
at LineStore.getTableData (src/pdf-parse/geometry/LineStore.ts:34)
at PDFParse.getTable (src/pdf-parse/PDFParse.ts:746)
Root cause
findBottomLineIndex() returns -1 when no matching horizontal line is found at a given x-coordinate. getRow() then accesses this.hLines[-1] which is undefined, causing the crash on bottomHLine.from.y.
// Table.ts — getRow()
const bottomHLineIndex = this.findBottomLineIndex(h2Index, xMiddle);
const bottomHLine = this.hLines[bottomHLineIndex]; // hLines[-1] = undefined
// ...
maxXY: new Point(rightVLine.from.x, bottomHLine.from.y), // 💥 crash
Suggested fix
Add a guard when findBottomLineIndex returns -1:
const bottomHLineIndex = this.findBottomLineIndex(h2Index, xMiddle);
if (bottomHLineIndex === -1) continue; // skip cell with missing bottom border
const bottomHLine = this.hLines[bottomHLineIndex];
This skips individual cells where the bottom border line is missing, while other cells, rows, and tables parse normally.
Environment
- pdf-parse v2.4.5
- Node.js (server-side, NestJS)
- Multiple real-world PDFs with varying page counts (10–87 pages)
Related
When parsing PDFs with incomplete table borders (e.g., partial lines or missing cell boundaries),
Table.getRow()crashes with:Root cause
findBottomLineIndex()returns-1when no matching horizontal line is found at a given x-coordinate.getRow()then accessesthis.hLines[-1]which isundefined, causing the crash onbottomHLine.from.y.Suggested fix
Add a guard when
findBottomLineIndexreturns-1:This skips individual cells where the bottom border line is missing, while other cells, rows, and tables parse normally.
Environment
Related