Skip to content

fix: avoid leaks in the per-ImageInput/Output error message#5276

Open
lgritz wants to merge 1 commit into
AcademySoftwareFoundation:mainfrom
lgritz:lg-errorleak
Open

fix: avoid leaks in the per-ImageInput/Output error message#5276
lgritz wants to merge 1 commit into
AcademySoftwareFoundation:mainfrom
lgritz:lg-errorleak

Conversation

@lgritz

@lgritz lgritz commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

(The same problem and set of fixes were applied to both ImageInput and ImageOutput. For brevity, we will only discuss ImageInput below.)

ImageInput error messages are stored in a thread_local map keyed by a unique per-instance id (m_id). The destructor's erase of that entry was commented out (over caution about the static destruction order fiasco), so every ImageInput that accumulated an error and was destroyed without its error being cleared via geterror() leaked one map entry. The id is monotonically increasing, so entries never get reused and the map grows without bound.

How did we find this? A fuzzer opening hundreds of thousands of corrupt files -- each of which errors, and each ImageInput::create() attempt tries several readers -- grew the map to tens of MB and eventually OOMed. The leak was traced to this issue.

So we re-enable the erase, guarded against the destruction order fiasco: the thread_local map is wrapped in a struct whose destructor clears an 'alive' flag, and the ImageInput destructor only touches the map while it is alive. Each ImageInput removes only its own entry (keyed by its unique id); other inputs' errors are untouched.

Cross-thread cleanup is still a known limitation -- errors accumulated in a different thread than the one that descroys the ImageInput are still leaked, noted in a TODO. However, this should be exceptionally rare, since not only do you need the two-thread situation, but the one that made the call that produced an error needs to not check for errors and return/clear the message. So only badly behaving user code that fails to check and retrieve errors should have this minor leak. And we are in the process of adding compile-time warnings for not capturing the error status values from our APIs.

Assisted-by: Claude Code / Claude Opus 4.8

(The same problem and set of fixes were applied to both ImageInput
and ImageOutput. For brevity, we will only discuss ImageInput below.)

ImageInput error messages are stored in a thread_local map keyed by a
unique per-instance id (m_id). The destructor's erase of that entry
was commented out (over caution about the static destruction order
fiasco), so every ImageInput that accumulated an error and was
destroyed without its error being cleared via geterror() leaked one
map entry. The id is monotonically increasing, so entries never get
reused and the map grows without bound.

(How did we find this? A fuzzer hit this hard: opening hundreds of
thousands of corrupt files -- each of which errors, and each
ImageInput::create() attempt tries several readers -- grew the map to
tens of MB and eventually OOMed. The leak was traced to this issue.)

So we re-enable the erase, guarded against the destruction order
fiasco: the thread_local map is wrapped in a struct whose destructor
clears an 'alive' flag, and the ImageInput destructor only touches the
map while it is alive. Each ImageInput removes only its own entry
(keyed by its unique id); other inputs' errors are untouched.

Cross-thread cleanup is still a known limitation -- errors accumulated
in a different thread than the one that descroys the ImageInput are
still leaked, noted in a TODO. However, this should be exceptionally
rare, since not only do you need the two-thread situation, but the one
that made the call that produced an error needs to not check for
errors and return/clear the message. So only badly behaving user code
that fails to check and retrieve errors should have this minor
leak. And we are in the process of adding compile-time warnings for
not capturing the error status values from our APIs.

Assisted-by: Claude Code / Claude Opus 4.8

Signed-off-by: Larry Gritz <lg@larrygritz.com>
@lgritz lgritz requested a review from fpsunflower June 28, 2026 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant