Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes hang in WinUI apps published to AOT #104583

Merged
merged 8 commits into from
Sep 12, 2024

Conversation

manodasanW
Copy link
Contributor

In the AOT version of ComWrappers, we keep track of the native object wrappers that implement IReferenceTracker and iterate on them during GC callbacks as part of the reference walk. We were running into scenarios where GCHandles in that collection were already freed but are still in the collection and thereby causing unexpected exceptions. This looks to be the result of a race that is happening when we are removing from that collection and adding to it as the removal has no locking at all around it and is failing randomly. The adding to the collection does have some locking, but that locking is for another cache and isn't consistent everywhere we add to it. This change adds a lock which is used during interactions with that collection. The lock is not used during the GC callback scenarios as the threads would be frozen already and we can end up in a scenario where the lock is already being held.

Given we are now doing locking during removal in the finalizer, I am also updating the collection type to a HashSet rather than a List. This makes it consistent with what we do in the JIT version of this implementation and also looking at some brief traces of WinUIGallery in both scenarios, there is definitely an improvement seen during removal with the HashSet approach while not much additional cost when adding it or iterating over it.

Fixes #104582

…rom and adding to the nativeObjectReference list around the same time and causing removal to fail.
…d to improve performance of remove given it now takes a lock
@hez2010
Copy link
Contributor

hez2010 commented Aug 28, 2024

Related: microsoft/microsoft-ui-xaml#9928
I think this need to be fixed before WASDK 1.6 GA.

@Sergio0694
Copy link
Contributor

"I think this need to be fixed before WASDK 1.6 GA."

This seems infeasible at this point. WASDK 1.6 will go GA shortly, and this fix will likely only make it to .NET 9 GA at the earliest.

@FrayxRulez
Copy link

FrayxRulez commented Aug 28, 2024

This seems infeasible at this point. WASDK 1.6 will go GA shortly, and this fix will likely only make it to .NET 9 GA at the earliest.

What’s a reasonable ETA in months? So that I can drop WinUI for now and go back to UWP, while planning for the next steps.

@manodasanW
Copy link
Contributor Author

@FrayxRulez I don't have an ETA I can state yet as I still need to have a conversation with the .NET folks. But I do have plans on getting the PR feedback addressed very soon so we can then figure that out.

@Scottj1s
Copy link

Scottj1s commented Sep 6, 2024

Is there any conceivable workaround for this, while awaiting a .NET update?

@agocke agocke added the needs-author-action An issue or pull request that requires more info or actions from the author. label Sep 6, 2024
@hez2010
Copy link
Contributor

hez2010 commented Sep 9, 2024

Is there any conceivable workaround for this

No. Unfortunately it's a show stopper.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@manodasanW manodasanW merged commit d7948dc into dotnet:main Sep 12, 2024
88 checks passed
@agocke
Copy link
Member

agocke commented Sep 12, 2024

/backport to release/9.0

Copy link
Contributor

Started backporting to release/9.0: https://github.com/dotnet/runtime/actions/runs/10836541436

@manodasanW
Copy link
Contributor Author

/backport to release/8.0-staging

Copy link
Contributor

Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/10837018960

Copy link
Contributor

@manodasanW an error occurred while backporting to release/8.0-staging, please check the run log for details!

Error: @manodasanW is not a repo collaborator, backporting is not allowed. If you're a collaborator please make sure your dotnet team membership visibility is set to Public on https://github.com/orgs/dotnet/people?query=manodasanW

@jkoritzinsky
Copy link
Member

/backport to release/8.0-staging

Copy link
Contributor

Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/10838290100

Copy link
Contributor

@jkoritzinsky backporting to release/8.0-staging failed, the patch most likely resulted in conflicts:

$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: Fix exception during GC callback in WinUI scenarios due to removing from and adding to the nativeObjectReference list around the same time and causing removal to fail.
Using index info to reconstruct a base tree...
M	src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/InteropServices/ComWrappers.NativeAot.cs
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/InteropServices/ComWrappers.NativeAot.cs
Applying: Move common logic
Using index info to reconstruct a base tree...
M	src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/InteropServices/ComWrappers.NativeAot.cs
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/InteropServices/ComWrappers.NativeAot.cs
Applying: Change collection type to HashSet to match with JIT implementation and to improve performance of remove given it now takes a lock
Applying: Move to using a custom collection to protect against GC freezing threads during add / remove.
error: sha1 information is lacking or useless (src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/InteropServices/ComWrappers.NativeAot.cs).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0004 Move to using a custom collection to protect against GC freezing threads during add / remove.
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

Copy link
Contributor

@jkoritzinsky an error occurred while backporting to release/8.0-staging, please check the run log for details!

Error: git am failed, most likely due to a merge conflict.

manodasanW added a commit to manodasanW/runtime that referenced this pull request Sep 12, 2024
* Fix exception during GC callback in WinUI scenarios due to removing from and adding to the nativeObjectReference list around the same time and causing removal to fail.

* Move common logic

* Change collection type to HashSet to match with JIT implementation and to improve performance of remove given it now takes a lock

* Move to using a custom collection to protect against GC freezing threads during add / remove.

* Address PR feedback by using alternative approach to handle race

* Address PR feedback

* Address PR feedback around handle being freed.
@manodasanW manodasanW deleted the manodasanw/fixComWrappersAotHang branch September 12, 2024 21:28
jtschuster pushed a commit to jtschuster/runtime that referenced this pull request Sep 17, 2024
* Fix exception during GC callback in WinUI scenarios due to removing from and adding to the nativeObjectReference list around the same time and causing removal to fail.

* Move common logic

* Change collection type to HashSet to match with JIT implementation and to improve performance of remove given it now takes a lock

* Move to using a custom collection to protect against GC freezing threads during add / remove.

* Address PR feedback by using alternative approach to handle race

* Address PR feedback

* Address PR feedback around handle being freed.
@FrayxRulez
Copy link

Is this fix already available in either .NET 9 or .NET 8?

@filipnavara
Copy link
Member

Is this fix already available in either .NET 9 or .NET 8?

No. It will be available in .NET 9 RC2.

sirntar pushed a commit to sirntar/runtime that referenced this pull request Sep 30, 2024
* Fix exception during GC callback in WinUI scenarios due to removing from and adding to the nativeObjectReference list around the same time and causing removal to fail.

* Move common logic

* Change collection type to HashSet to match with JIT implementation and to improve performance of remove given it now takes a lock

* Move to using a custom collection to protect against GC freezing threads during add / remove.

* Address PR feedback by using alternative approach to handle race

* Address PR feedback

* Address PR feedback around handle being freed.
@github-actions github-actions bot locked and limited conversation to collaborators Oct 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exception thrown in GC callback while walking native object wrappers on AOT
10 participants