Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory corruption leading to Access Violation in .NET 9 using parallel reads on a Trie object #110088

Closed
Treit opened this issue Nov 22, 2024 · 6 comments

Comments

@Treit
Copy link

Treit commented Nov 22, 2024

Description

A small demonstration program using parallel reads of a Trie data structure lead to a memory corruption issue that crashed the program with an access violation. This does not reproduce on .NET 8 but reproduces on .NET 9.

I constructed a minimal repro project here: https://github.com/treit/ktrieIssue/

The Trie implementation is from this project: https://github.com/kpol/trie/tree/master

Reproduction Steps

  1. Clone this repo: https://github.com/treit/ktrieIssue/
  2. dotnet run --framework net9.0

Expected behavior

No access violation crash

Actual behavior

The program will eventually crash with an access violation. It may take a minute or two or it may happen immediately in my experience.

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at KTrie.Trie+<GetDescendantTerminalNodes>d__26.MoveNext()
   at KTrie.Trie+<GetTerminalNodes>d__20.MoveNext()
   at System.Linq.Enumerable+IEnumerableSkipTakeIterator`1[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
   at Program+<>c__DisplayClass0_0.<<Main>$>b__0()
   at System.Threading.Tasks.Task`1[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].InnerInvoke()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef,System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()

Regression?

This appears to be a regression because I cannot reproduce it running on .NET 8.

Known Workarounds

No response

Configuration

Windows 11 x64 running on .NET 9.

I did not have a chance to try this on Linux.

I did notice that I could not reproduce it on my local workstation, but it reproduces on both my laptop and my cloud dev box.

Other information

No response

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Nov 22, 2024
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@hez2010
Copy link
Contributor

hez2010 commented Nov 22, 2024

Full stacktrace without inlining:

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Collections.Generic.Queue`1[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Enqueue(System.__Canon)
   at KTrie.Trie+<GetDescendantTerminalNodes>d__26.MoveNext()
   at KTrie.Trie+<GetTerminalNodes>d__20.MoveNext()
   at System.Linq.Enumerable+IEnumerableSelectIterator`2[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
   at System.Linq.Enumerable+IEnumerableSkipTakeIterator`1[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
   at Program+<>c__DisplayClass0_0.<<Main>$>b__0()
   at System.Threading.Tasks.Task`1[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].InnerInvoke()
   at System.Threading.Tasks.Task+<>c.<.cctor>b__292_0(System.Object)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
   at System.Threading.Tasks.Task.ExecuteEntryUnsafe(System.Threading.Thread)
   at System.Threading.Tasks.Task.ExecuteFromThreadPool(System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.DispatchWorkItem(System.Object, System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerDoWork(System.Threading.PortableThreadPool, Boolean ByRef)
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
   at System.Threading.Thread+StartHelper.RunWorker()
   at System.Threading.Thread+StartHelper.Run()
   at System.Threading.Thread.StartCallback()

I noticed kpol/trie#5 which indicates KTrie is not thread-safe. Maybe better to confirm whether it's a runtime bug or an issue in the implementation of KTrie.

UPDATE: found corrupted objects in the GC heap:

!dumpstack
OS Thread Id: 0x96a0 (38)
Current frame: 00007ff83994002e
Child-SP         RetAddr          Caller, Callee
000000F77E1FF150 00007ff839f3b258 (MethodDesc 00007ff83a078b98 + 0x28 System.Collections.Generic.Queue`1[[System.__Canon, System.Private.CoreLib]].Enqueue(System.__Canon)), calling (MethodDesc 00007ff839a98cd0 + 0 System.Runtime.CompilerServices.CastHelpers.StelemRef(System.Object[], IntPtr, System.Object))
000000F77E1FF160 00007ff839f2ed3a (MethodDesc 00007ff839b438c0 + 0x3a System.Runtime.CompilerServices.CastCache.TryGet(Int32[], UIntPtr, UIntPtr)), calling (MethodDesc 00007ff839b438a0 + 0 System.Runtime.CompilerServices.CastCache.Element(Int32 ByRef, Int32))
000000F77E1FF190 00007ff839f3b0db (MethodDesc 00007ff83a078138 + 0xdb KTrie.Trie+<GetDescendantTerminalNodes>d__26.MoveNext()), calling (MethodDesc 00007ff83a078b98 + 0 System.Collections.Generic.Queue`1[[System.__Canon, System.Private.CoreLib]].Enqueue(System.__Canon))
000000F77E1FF1A0 00007ff839f3c339 (MethodDesc 00007ff83a076660 + 0x29 System.Linq.Enumerable+IEnumerableSkipTakeIterator`1[[System.__Canon, System.Private.CoreLib]]..ctor(System.Collections.Generic.IEnumerable`1<System.__Canon>, Int32, Int32)), calling 00007ff839940010
000000F77E1FF1D0 00007ff839f3b2f3 (MethodDesc 00007ff83a038bb0 + 0x33 KTrie.Trie+<GetTerminalNodes>d__20.MoveNext())
...

!verifyheap
Segment          Object           Failure                          Reason
024dfa771250     020df501a960     InvalidMethodTable               Object 20df501a960 has an invalid method table 0
024dfa771250     020df502b210     InvalidMethodTable               Object 20df502b210 has an invalid method table 0
024dfa771250     020df503dae0     InvalidMethodTable               Object 20df503dae0 has an invalid method table 0
024dfa771250     020df5053740     InvalidMethodTable               Object 20df5053740 has an invalid method table 0
024dfa771250     020df50693a0     InvalidMethodTable               Object 20df50693a0 has an invalid method table 0
024dfa771250     020df506edf0     InvalidMethodTable               Object 20df506edf0 has an invalid method table 0

!listnearobj 20df501a960
Before:              020df5013548 29,720 (0x7418)                  KTrie.TrieNodes.CharTrieNode[]
Current:             020df501a960                                  Unknown
Error Detected: Object 20df501a960 has an invalid method table 0 [verify heap]
Next:                020df501e3a8 56 (0x38)                        KTrie.Trie+<GetTerminalNodes>d__20
Expected to find next object at 20df501a960, instead found it at 20df501e3a8.
Heap local consistency not confirmed.

Seems like a GC hole in Queue<T> where Queue<T>:_array becomes invalid somehow?

@jkotas
Copy link
Member

jkotas commented Nov 22, 2024

StelemRef

Dup of #108763 ?

@AndyAyersMS
Copy link
Member

I wonder if this is a case of calling a managed jit helper in a no gc region.

@Treit
Copy link
Author

Treit commented Nov 23, 2024

StelemRef

Dup of #108763 ?

Confirmed that setting <CETCompat>false</CETCompat> fixes the issue, per suggestion from hez2010 / skyake on the C# discord.

@jkotas
Copy link
Member

jkotas commented Nov 25, 2024

I have confirmed that it is duplicate of #108763 that is scheduled to be fixed in one of the upcoming .NET 9.0 servicing updates #109548.

Thank you for reporting this issue!

@jkotas jkotas closed this as completed Nov 25, 2024
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants