fix: avoid leaks in the per-ImageInput/Output error message#5276
Open
lgritz wants to merge 1 commit into
Open
Conversation
(The same problem and set of fixes were applied to both ImageInput and ImageOutput. For brevity, we will only discuss ImageInput below.) ImageInput error messages are stored in a thread_local map keyed by a unique per-instance id (m_id). The destructor's erase of that entry was commented out (over caution about the static destruction order fiasco), so every ImageInput that accumulated an error and was destroyed without its error being cleared via geterror() leaked one map entry. The id is monotonically increasing, so entries never get reused and the map grows without bound. (How did we find this? A fuzzer hit this hard: opening hundreds of thousands of corrupt files -- each of which errors, and each ImageInput::create() attempt tries several readers -- grew the map to tens of MB and eventually OOMed. The leak was traced to this issue.) So we re-enable the erase, guarded against the destruction order fiasco: the thread_local map is wrapped in a struct whose destructor clears an 'alive' flag, and the ImageInput destructor only touches the map while it is alive. Each ImageInput removes only its own entry (keyed by its unique id); other inputs' errors are untouched. Cross-thread cleanup is still a known limitation -- errors accumulated in a different thread than the one that descroys the ImageInput are still leaked, noted in a TODO. However, this should be exceptionally rare, since not only do you need the two-thread situation, but the one that made the call that produced an error needs to not check for errors and return/clear the message. So only badly behaving user code that fails to check and retrieve errors should have this minor leak. And we are in the process of adding compile-time warnings for not capturing the error status values from our APIs. Assisted-by: Claude Code / Claude Opus 4.8 Signed-off-by: Larry Gritz <lg@larrygritz.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
(The same problem and set of fixes were applied to both ImageInput and ImageOutput. For brevity, we will only discuss ImageInput below.)
ImageInput error messages are stored in a thread_local map keyed by a unique per-instance id (m_id). The destructor's erase of that entry was commented out (over caution about the static destruction order fiasco), so every ImageInput that accumulated an error and was destroyed without its error being cleared via geterror() leaked one map entry. The id is monotonically increasing, so entries never get reused and the map grows without bound.
How did we find this? A fuzzer opening hundreds of thousands of corrupt files -- each of which errors, and each ImageInput::create() attempt tries several readers -- grew the map to tens of MB and eventually OOMed. The leak was traced to this issue.
So we re-enable the erase, guarded against the destruction order fiasco: the thread_local map is wrapped in a struct whose destructor clears an 'alive' flag, and the ImageInput destructor only touches the map while it is alive. Each ImageInput removes only its own entry (keyed by its unique id); other inputs' errors are untouched.
Cross-thread cleanup is still a known limitation -- errors accumulated in a different thread than the one that descroys the ImageInput are still leaked, noted in a TODO. However, this should be exceptionally rare, since not only do you need the two-thread situation, but the one that made the call that produced an error needs to not check for errors and return/clear the message. So only badly behaving user code that fails to check and retrieve errors should have this minor leak. And we are in the process of adding compile-time warnings for not capturing the error status values from our APIs.
Assisted-by: Claude Code / Claude Opus 4.8