Skip to content

Fix NativeAOT GC hole issue#129598

Merged
jkotas merged 7 commits into
dotnet:mainfrom
janvorli:fix-nativeaot-eh-gc-hole
Jun 20, 2026
Merged

Fix NativeAOT GC hole issue#129598
jkotas merged 7 commits into
dotnet:mainfrom
janvorli:fix-nativeaot-eh-gc-hole

Conversation

@janvorli

Copy link
Copy Markdown
Member

There is a GC hole when:

  • an exception is rethrown from a funclet
  • the exception escapes that funclet
  • a finally is executed for this secondary exception
  • GC runs while the call chain of this finally is being executed
  • A reference in non-volatile register is pushed in a prolog of one of the functions in the finally call chain
  • the nonvolatile register holds a live reference up somewhere up in the call chain of the parent of the catch handler that catches the secondary exception
  • the nonvolatile register is not pushed anywhere between the parent of the catch and the frame where the nonvolatile register holds a live GC reference

In this case, if GC relocates that reference, it is updated in the stack frame of the finally call chain, but not in the location referenced by the REGDISPLAY in the ExInfo of the secondary exception. So when we resume after catch, the stale reference is placed in the nonvolatile register and then it bubbles up the call chain until it reaches the frame where the register is supposed to hold live GC reference.

The fix is to save the nonvolatile registers after returning from a finally funclet back to the location referenced by the REGDISPLAY passed to the RhpCallFinallyFunclet.

Close #129010

There is a GC hole when:
* an exception is rethrown from a funclet
* the exception escapes that funclet
* a finally is executed for this secondary exception
* GC runs while the call chain of this finally is being executed
* A reference in non-volatile register is pushed in a prolog
  of one of the functions in the finally call chain
* the nonvolatile register holds a live reference up somewhere up
  in the call chain of the parent of the catch handler that catches
  the secondary exception
* the nonvolatile register is not pushed anywhere between the parent
  of the catch and the frame where the nonvolatile register holds
  a live GC reference

In this case, if GC relocates that reference, it is updated in the
stack frame of the finally call chain, but not in the location
referenced by the REGDISPLAY in the ExInfo of the secondary exception.
So when we resume after catch, the stale reference is placed in the
nonvolatile register and then it bubbles up the call chain until it
reaches the frame where the register is supposed to hold live GC
reference.

The fix is to save the nonvolatile registers after returning from a
finally funclet back to the location referenced by the REGDISPLAY passed
to the RhpCallFinallyFunclet.

Close dotnet#129010
@janvorli janvorli requested review from jakobbotsch and jkotas June 18, 2026 23:04
@janvorli janvorli self-assigned this Jun 18, 2026
Copilot AI review requested due to automatic review settings June 18, 2026 23:04
@janvorli

Copy link
Copy Markdown
Member Author

@jakobbotsch thank you so much for reproducing it with time travel debugging, it would be hard to reason about it without that!

@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @agocke, @dotnet/ilc-contrib
See info in area-owners.md if you want to be subscribed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the NativeAOT AMD64 exception-handling stubs so that after executing a finally funclet, the current values of preserved (non-volatile) registers are written back to the locations described by the passed-in REGDISPLAY (and, on Windows x64, the preserved XMM register values are written back into REGDISPLAY). This keeps the REGDISPLAY state consistent if a GC occurs during the finally call chain and relocates references that were temporarily spilled.

Changes:

  • Add preserved-register write-back in the System V AMD64 RhpCallFinallyFunclet2 path.
  • Add preserved-register + XMM6–XMM15 write-back in the Windows AMD64 RhpCallFinallyFunclet2 path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/coreclr/nativeaot/Runtime/amd64/ExceptionHandling.S Writes back AMD64 SysV preserved integer registers to the homes referenced by REGDISPLAY after the finally funclet returns.
src/coreclr/nativeaot/Runtime/amd64/ExceptionHandling.asm Writes back Windows x64 preserved integer registers (and XMM6–XMM15) to REGDISPLAY state after the finally funclet returns.

@jkotas

jkotas commented Jun 19, 2026

Copy link
Copy Markdown
Member

Can we add a regression test for this?

Could you please change "Funclets are not required to preserve non-volatile registers." in https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/clr-abi.md to "Funclets are not required to preserve non-volatile registers that are saved by main method body."

Thank you both!

@janvorli

Copy link
Copy Markdown
Member Author

@jkotas, I've added a regression test and updated the ABI doc, can you please take a look again?

Comment thread src/tests/Regressions/coreclr/GitHub_129010/Test129010.csproj Outdated
Comment thread src/tests/Regressions/coreclr/GitHub_129010/Test129010.csproj Outdated

@jkotas jkotas left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Copilot AI review requested due to automatic review settings June 19, 2026 17:14

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comment thread src/tests/Regressions/coreclr/GitHub_129010/test129010.cs
Comment thread src/tests/Regressions/coreclr/GitHub_129010/test129010.cs
Comment thread docs/design/coreclr/botr/clr-abi.md Outdated
Comment thread src/tests/Regressions/coreclr/GitHub_129010/test129010.cs Outdated
Comment thread src/tests/Regressions/coreclr/GitHub_129010/test129010.cs Outdated
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Copilot AI review requested due to automatic review settings June 19, 2026 17:57

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread docs/design/coreclr/botr/clr-abi.md
Comment thread src/coreclr/nativeaot/Runtime/amd64/ExceptionHandling.asm Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 19, 2026 18:35

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread docs/design/coreclr/botr/clr-abi.md Outdated
Comment thread docs/design/coreclr/botr/clr-abi.md Outdated
Copilot AI review requested due to automatic review settings June 19, 2026 19:18

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@jkotas

jkotas commented Jun 20, 2026

Copy link
Copy Markdown
Member

/azp run runtime-nativeaot-outerloop

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@jkotas

jkotas commented Jun 20, 2026

Copy link
Copy Markdown
Member

/ba-g flaky networking test, passed on automatic rerun

@jkotas

jkotas commented Jun 20, 2026

Copy link
Copy Markdown
Member

@MichalStrehovsky @jakobbotsch Could you please approve this? My approval is not good enough since I have merged doc nit from copilot.

@jkotas jkotas merged commit e556d5a into dotnet:main Jun 20, 2026
140 of 145 checks passed
@jkotas

jkotas commented Jun 21, 2026

Copy link
Copy Markdown
Member

Backport candidate?

@tannergooding

Copy link
Copy Markdown
Member

Backport candidate?

We don't currently have any way to help raise to users that they may want to patch and redeploy their NAOT apps, right?

They just need to monitor the monthly servicing release notes and make a judgement call on if it is impactful to them?

@jkotas

jkotas commented Jun 21, 2026

Copy link
Copy Markdown
Member

This is a concern for all dependencies that are bundled in the app or otherwise affect the app code, it is not specific to native aot.

It takes a lot of expertise to evaluate how given (security) fix affects given app. Some teams may choose to do that. However the easiest strategy is to regularly update to latest servicing and redeploy without trying to reason about individual changes.

@MichalStrehovsky

Copy link
Copy Markdown
Member

We don't currently have any way to help raise to users that they may want to patch and redeploy their NAOT apps, right?

I would generalize that to "patch and redeploy their self-contained apps". It's not specific to NAOT and affects all forced-selfcontained environments (mobile/WASM). Or even framework-dependent where there is no servicing story for the framework (macOS). Windows and Linux might be the only "nice" exceptions where one doesn't have to think about it.

@dotnet-milestone-bot dotnet-milestone-bot Bot added this to the 11.0-preview6 milestone Jun 22, 2026
@janvorli

Copy link
Copy Markdown
Member Author

/backport to release/10.0

@github-actions

Copy link
Copy Markdown
Contributor

Started backporting to release/10.0 (link to workflow run)

svick pushed a commit that referenced this pull request Jun 23, 2026
Backport of #129598 to release/10.0

/cc @janvorli

## Customer Impact

- [ ] Customer reported
- [x] Found internally

There is a GC hole in NativeAOT. As any other GC hole, it could lead to
intermittent failures of applications due to unexpected
NullReferenceException, AccessViolationException or just unexpected
behavior. This GC hole occurs in some cases when GC scans stack with
active exception handling when an exception thrown from a call chain of
a funclet escapes the funclet and GC occurs when a finally handler of
that secondary exception is being executed.

## Regression

- [x]  Yes
- [ ] No

Introduced by #115284 in .NET 10.0

## Testing

Libraries test that exposed the issue, directed regression test, CI
coreclr and libraries tests.

## Risk

Low. The change just ensures that a modified non-volatile register value
is saved in the stack frame iterator of the pending exception handling,
keeping that value up to date in case GC moves it.

---------

Co-authored-by: Jan Vorlicek <janvorli@microsoft.com>
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ci-scan] Test failure: NumberHandlingTests_Metadata.Number_AsCollectionElement_RoundTrip on NativeAOT windows-x64

6 participants