Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent hang/deadlock in .net core 5.0 RC1 while debugging #42375

Closed
joakimriedel opened this issue Sep 17, 2020 · 58 comments
Closed

Intermittent hang/deadlock in .net core 5.0 RC1 while debugging #42375

joakimriedel opened this issue Sep 17, 2020 · 58 comments
Assignees
Labels
area-TieredCompilation-coreclr tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Milestone

Comments

@joakimriedel
Copy link

joakimriedel commented Sep 17, 2020

Description

After upgrading our web application from 3.1 to 5.0 RC1, I get intermittent hangs/deadlocks while debugging. At first, I thought it was a problem specific to Visual Studio 16.8 preview 3 (https://developercommunity.visualstudio.com/content/problem/1187332/debugger-hangs-sporadically-in-visual-studio-2019.html), but I have now verified that the issue is the same when debugging through Visual Studio Code which makes me think this is related to the 5.0 RC1 runtime and not the IDE.

I have not been able to reproduce this when running the project without debugging (ctrl-f5) - only when debugging (f5).

Unfortunately I cannot reliably reproduce this, it seems to be timing related and more often happens when loading pages in the application that fires a lot of different connections to the server. The application also heavily uses EF Core and SignalR Core.

Configuration

.NET Core 5.0 RC1
Win10 build 2004
Threadripper CPU

Regression?

Yes. This never occurred in net core 3.1.

Other information

I find it very hard to debug this issue. It only happens when starting the project in the debugger. But since it totally hangs the debugger, I am unable to break into the application. I cannot attach another debugger, then I get the error "another debugger is already attached".

Some characteristics:

When it happens, the diagnostics logger stops updating

image

I cannot break or terminate the debugger

image

image

The IIS worker process is idle.

image

The output window stops logging any more entries. The last entry is at random, I cannot see any pattern in that the hang would happen after a certain kind of event.

The only way to get back control is to kill the IIS worker process manually through task manager.

I have followed the steps to generate a dotnet-dump, but do not know how to analyze it.

How can I go forward to resolve this? I can reproduce but not choose when to reproduce. Sometimes it hangs on first load, sometimes I have to click around in the application for a few minutes to reproduce.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-Diagnostics-coreclr untriaged New issue has not been triaged by the area owner labels Sep 17, 2020
@ghost
Copy link

ghost commented Sep 17, 2020

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

@joakimriedel
Copy link
Author

Found the dotnet-trace tool and ran it for a few seconds on the hung process. It is not totally dead, it seems to be doing some GC-related things frequently seen in perfview:

image

Can I use the dotnet-trace tool with some arguments to better find out what's happening behind the scenes?

@joakimriedel
Copy link
Author

Some more debugging. If I start with ctrl-f5 (no debugger) and later...

  1. attach a native debugger such as WinDbg: does not reproduce
  2. attach a managed debugger from Visual Studio: reproduce
  3. first attach a native debugger (WinDbg) and then detach, and then a managed debugger (Visual Studio): does not reproduce

The third could be a red herring since it's an intermittent issue, but seems consistent. Also worth noting in the third case that the diagnostics tool in VS reports 1,3 gb of memory after I have been using a native debugger but only ~700 mb when I use a managed debugger.

@tommcdon tommcdon removed the untriaged New issue has not been triaged by the area owner label Sep 17, 2020
@tommcdon tommcdon added this to the 5.0.0 milestone Sep 17, 2020
@tommcdon
Copy link
Member

@hoyosjs

@tommcdon
Copy link
Member

@joakimriedel would it be possible to send us a dump of the debuggee, devenv, and mvsmon at the point of the hang? To securely share files, please open a VS Feedback Item and send us a link to it.
@caslan FYI

@joakimriedel
Copy link
Author

@tommcdon please see the original VS Feedback Item in the first issue post above: https://developercommunity.visualstudio.com/content/problem/1187332/debugger-hangs-sporadically-in-visual-studio-2019.html

@joakimriedel
Copy link
Author

Btw, it was another red herring about the breakpoints mentioned in the VS Feedback Item. This will reproduce without first setting any breakpoints in the code.

@joakimriedel
Copy link
Author

@tommcdon I just found something! The frequency of this error happening depends on the amount of LogLevel I set. Using all "Debug" it reproduces almost instantly on first load of the application.

image

If I set all LogLevels to "Warning" (almost none or little log items) I cannot seem to reproduce.

Does this give you any pointers on where the problem might lie?

@josalem
Copy link
Contributor

josalem commented Sep 17, 2020

I have followed the steps to generate a dotnet-dump, but do not know how to analyze it.

@joakimriedel, if you collected a dump using dotnet-dump, you can use the dotnet-dump analyze verb to inspect it. If you run dotnet-dump analyze /path/to/dump and then enter the command clrstack -f -all it should dump the callstacks of all the threads in the process. It will include function names from your app, so if they're sensitive be sure to read the data and remove anything you need to before sharing here.

@joakimriedel
Copy link
Author

joakimriedel commented Sep 18, 2020

@josalem very interesting, thanks for the tip! all threads but these three are in either System.Threading.Monitor.ObjWait or System.Threading.WaitHandle.WaitOneCore

thread A

OS Thread Id: 0x757c
        Child SP               IP Call Site
000000010A158B38                  [HelperMethodFrame_PROTECTOBJ: 000000010a158b38] System.Diagnostics.Debugger.CustomNotification(System.Diagnostics.ICustomDebuggerNotification)
000000010A158C50 00007FF8E50B75BB System.Private.CoreLib.dll!System.Diagnostics.Debugger.NotifyOfCrossThreadDependencySlow() + 27 [/_/src/coreclr/src/System.Private.CoreLib/src/System/Diagnostics/Debugger.cs @ 35]
... 150+ deep call stack

thread B

OS Thread Id: 0x45a8
        Child SP               IP Call Site
000000010D2499C8                  [HelperMethodFrame_2OBJ: 000000010d2499c8] System.Diagnostics.Debugger.Log(Int32, System.String, System.String)
000000010D249D00 00007FF8E50B2065 System.Private.CoreLib.dll!System.Diagnostics.DebugProvider.WriteCore(System.String) + 133 [/_/src/libraries/System.Private.CoreLib/src/System/Diagnostics/DebugProvider.Windows.cs @ 62]
000000010D249D70 00007FF8E50B1EA6 System.Private.CoreLib.dll!System.Diagnostics.DebugProvider.Write(System.String) + 134 [/_/src/libraries/System.Private.CoreLib/src/System/Diagnostics/DebugProvider.cs @ 64]
... 140+ deep call stack

thread C

OS Thread Id: 0x5dc4
        Child SP               IP Call Site
000000012497E5C0                  [InlinedCallFrame: 000000012497e5c0] System.Data.SqlClient.dll!System.Data.SqlClient.SNINativeMethodWrapper.SNIReadSyncOverAsync(System.Data.SqlClient.SNIHandle, IntPtr ByRef, Int32)
000000012497E5C0                  [InlinedCallFrame: 000000012497e5c0] System.Data.SqlClient.dll!System.Data.SqlClient.SNINativeMethodWrapper.SNIReadSyncOverAsync(System.Data.SqlClient.SNIHandle, IntPtr ByRef, Int32)
... 30+ deep call stack

My amateur guess is that these three threads are deadlocking each other somehow. I made two dumps of the hung process a minute apart, and it showed the same three threads with the same call stack.

This actually strengthens the observation I had above that modifying the LogLevel will increase the likelihood of reproducing the error.

ping @tommcdon

@tommcdon
Copy link
Member

@joakimriedel thank you for sending the VS feedback item our way. It looks like there may be a thread suspension related issue here that I am hoping @kouvel or @noahfalk can take a look at. It appears that the debugger is requesting debugger suspension but we are not able to synchronize for the debugger. If this a regression between P8 and RC1, then the stack below might be interesting. Out of curiosity, can you try disabling Tiered compilation by setting COMPLUS_TieredCompilation=0?

0:181> !ThreadState 1029228
    Debug Suspend Pending
    Legal to Join
    Background
    CLR Owns
    In Multi Threaded Apartment
    Fully initialized
    Thread Pool Worker Thread
0:181> k
 # Child-SP          RetAddr           Call Site
00 00000035`1154f118 00007ff9`b1ff619d ntdll!ZwWaitForAlertByThreadId+0x14 [minkernel\ntdll\daytona\objfre\amd64\usrstubs.asm @ 3891] 
01 00000035`1154f120 00007ff9`b1ff6052 ntdll!RtlpWaitOnAddressWithTimeout+0x81 [minkernel\ntos\rtl\waitaddr.c @ 851] 
02 00000035`1154f150 00007ff9`b1ff5e6d ntdll!RtlpWaitOnAddress+0xae [minkernel\ntos\rtl\waitaddr.c @ 1094] 
03 00000035`1154f1c0 00007ff9`b1fd15b4 ntdll!RtlpWaitOnCriticalSection+0xfd [minkernel\ntos\rtl\resource.c @ 1610] 
04 00000035`1154f2a0 00007ff9`b1fd13e2 ntdll!RtlpEnterCriticalSectionContended+0x1c4 [minkernel\ntos\rtl\resource.c @ 2317] 
05 00000035`1154f300 00007ff9`6379d80a ntdll!RtlEnterCriticalSection+0x42 [minkernel\ntos\rtl\resource.c @ 1923] 
06 00000035`1154f330 00007ff9`6376742f coreclr!CrstBase::Enter+0x5a [F:\workspace\_work\1\s\src\coreclr\src\vm\crst.cpp @ 330] 
07 (Inline Function) --------`-------- coreclr!Thread::HasThreadStateOpportunistic+0xc [F:\workspace\_work\1\s\src\coreclr\src\vm\threads.h @ 1290] 
08 (Inline Function) --------`-------- coreclr!CrstBase::CrstAndForbidSuspendForDebuggerHolder::{ctor}+0x50 [F:\workspace\_work\1\s\src\coreclr\src\vm\crst.cpp @ 823] 
09 00000035`1154f360 00007ff9`637c7830 coreclr!MethodDescBackpatchInfoTracker::ConditionalLockHolderForGCCoop::ConditionalLockHolderForGCCoop+0x67 [F:\workspace\_work\1\s\src\coreclr\src\vm\methoddescbackpatchinfo.h @ 134] 
0a 00000035`1154f3a0 00007ff9`63773078 coreclr!TieredCompilationManager::DeactivateTieringDelay+0x1d0 [F:\workspace\_work\1\s\src\coreclr\src\vm\tieredcompilation.cpp @ 451] 
0b 00000035`1154f710 00007ff9`637d978c coreclr!ThreadpoolMgr::AsyncTimerCallbackCompletion+0x38 [F:\workspace\_work\1\s\src\coreclr\src\vm\win32threadpool.cpp @ 4740] 
0c 00000035`1154f740 00007ff9`637778ac coreclr!UnManagedPerAppDomainTPCount::DispatchWorkItem+0x15c [F:\workspace\_work\1\s\src\coreclr\src\vm\threadpoolrequest.cpp @ 482] 
0d (Inline Function) --------`-------- coreclr!ThreadpoolMgr::ExecuteWorkRequest+0x1cf [F:\workspace\_work\1\s\src\coreclr\src\vm\win32threadpool.cpp @ 1552] 
0e 00000035`1154f7e0 00007ff9`b1636fd4 coreclr!ThreadpoolMgr::WorkerThreadStart+0x33c [F:\workspace\_work\1\s\src\coreclr\src\vm\win32threadpool.cpp @ 1977] 
0f 00000035`1154f930 00007ff9`b1ffcec1 kernel32!BaseThreadInitThunk+0x14 [clientcore\base\win32\client\thread.c @ 64] 
10 00000035`1154f960 00000000`00000000 ntdll!RtlUserThreadStart+0x21 [minkernel\ntdll\rtlstrt.c @ 1153] 

@hoyosjs
Copy link
Member

hoyosjs commented Sep 18, 2020

This looks like another case of #38736. We start choking on processes that are heavily async trying to make sure evaluating properties and functions won't deadlock when we try to freeze the process. However, we often do so too eagerly. That issue is open to relax the conditions under which we emit the notifications.

I don't have a good solution for your perf issue under the debugger. The only work around I know of is to reduce the number of events getting generated. A trace will tell us for sure if it's the logging machinery as you believe (and if it's even the notifications as I'm thinking).

@joakimriedel
Copy link
Author

@tommcdon COMPLUS_TieredCompilation does not seem to make any difference

@hoyosjs not a perf issue, cpu usage seems low. problem is that my process hangs/deadlocks so I have to force-quit and restart debugging

@joakimriedel
Copy link
Author

@hoyosjs as you see in the attached image in the first post cpu is less than 10% when it hangs.

But you are right that it is a heavily async process. In the huge call stack that I only supplied the top rows for above, there are 15+ System.Threading.Tasks.Task.RunContinuations related to both AspNetCore, SignalR Core and EF Core async methods.

@kouvel
Copy link
Member

kouvel commented Sep 18, 2020

It looks like the thread that holds the lock above is stuck trying to enter cooperative GC mode:

0:118> kn 10
 # Child-SP          RetAddr           Call Site
00 00000035`5210bd68 00007ff9`af73818e ntdll!ZwDelayExecution+0x14 [minkernel\ntdll\daytona\objfre\amd64\usrstubs.asm @ 595] 
01 00000035`5210bd70 00007ff9`638bdce7 KERNELBASE!SleepEx+0x9e [minkernel\kernelbase\thread.c @ 2425] 
02 (Inline Function) --------`-------- coreclr!ClrSleepEx+0xb [F:\workspace\_work\1\s\src\coreclr\src\vm\hosting.cpp @ 259] 
03 00000035`5210be10 00007ff9`63751e30 coreclr!__SwitchToThread+0x1459df [F:\workspace\_work\1\s\src\coreclr\src\vm\hosting.cpp @ 310] 
04 00000035`5210be40 00007ff9`637db58c coreclr!Thread::RareDisablePreemptiveGC+0x2d0 [F:\workspace\_work\1\s\src\coreclr\src\vm\threadsuspend.cpp @ 2378] 
05 (Inline Function) --------`-------- coreclr!Thread::DisablePreemptiveGC+0x1f [F:\workspace\_work\1\s\src\coreclr\src\vm\threads.h @ 1974] 
06 (Inline Function) --------`-------- coreclr!GCHolderBase::EnterInternalCoop+0x37 [F:\workspace\_work\1\s\src\coreclr\src\vm\threads.h @ 5486] 
07 00000035`5210beb0 00007ff9`637c4bf7 coreclr!GCCoop::GCCoop+0x54 [F:\workspace\_work\1\s\src\coreclr\src\vm\threads.h @ 5606] 
08 00000035`5210bee0 00007ff9`637cafdf coreclr!MethodDescBackpatchInfoTracker::Backpatch_Locked+0x1f [F:\workspace\_work\1\s\src\coreclr\src\vm\methoddescbackpatchinfo.cpp @ 77] 
09 (Inline Function) --------`-------- coreclr!MethodDesc::TryBackpatchEntryPointSlots+0x76 [F:\workspace\_work\1\s\src\coreclr\src\vm\method.cpp @ 4998] 
0a (Inline Function) --------`-------- coreclr!MethodDesc::TryBackpatchEntryPointSlotsFromPrestub+0x76 [F:\workspace\_work\1\s\src\coreclr\src\vm\method.hpp @ 1381] 
0b (Inline Function) --------`-------- coreclr!MethodDesc::TrySetInitialCodeEntryPointForVersionableMethod+0x675 [F:\workspace\_work\1\s\src\coreclr\src\vm\method.cpp @ 5018] 
0c 00000035`5210bf20 00007ff9`637ca37c coreclr!CodeVersionManager::PublishVersionableCodeIfNecessary+0x97f [F:\workspace\_work\1\s\src\coreclr\src\vm\codeversion.cpp @ 1777] 
0d 00000035`5210c3b0 00007ff9`637ca0d1 coreclr!MethodDesc::DoPrestub+0x16c [F:\workspace\_work\1\s\src\coreclr\src\vm\prestub.cpp @ 2127] 
0e 00000035`5210c4d0 00007ff9`63859155 coreclr!PreStubWorker+0x231 [F:\workspace\_work\1\s\src\coreclr\src\vm\prestub.cpp @ 1952] 
0f 00000035`5210c670 00007ff9`50544271 coreclr!ThePreStub+0x55 [F:\workspace\_work\1\s\src\coreclr\src\vm\amd64\ThePreStubAMD64.asm @ 21] 

I think #40060 may have missed this spin-wait that is blocking switching GC modes. I'll look into fixing this.

In the meantime, it should be possible to work around this by disabling tiered compilation when debugging. This can be done in the project file of the web app as below. After a clean build the next launch should make the config effective.

<Project Sdk="Microsoft.NET.Sdk.Web">
  <PropertyGroup>
    <TieredCompilation>false</TieredCompilation>
  </PropertyGroup>
</Project>

@joakimriedel
Copy link
Author

joakimriedel commented Sep 19, 2020

Thanks @kouvel but unfortunately it still reproduces with TieredCompilation set to false. I also tried the environment variable COMPLUS_TieredCompilation pointed out by @tommcdon to no avail.

My problem is that I am debugging some hard query issues in EF Core 5.0 RC1 where I need more verbose log output but setting the LogLevel to Debug will hang the debugger due to this regression. I hope you will find a solution to this.

I am surprised to not see many others affected by this? Is this an AMD-specific problem running Threadripper? I will try our solution on an Intel-machine and see if it still reproduces.

EDIT: No. Reproduces on an Intel machine as well. A bit different call stacks.

Is this the SpinWait you referred to?

OS Thread Id: 0x336c
        Child SP               IP Call Site
000000C85F37F478                  [InlinedCallFrame: 000000c85f37f478] System.Private.CoreLib.dll!System.Threading.Thread.YieldInternal()
000000C85F37F478                  [InlinedCallFrame: 000000c85f37f478] System.Private.CoreLib.dll!System.Threading.Thread.YieldInternal()
000000C85F37F450 00007FF81A81B9FD System.Private.CoreLib.dll!System.Threading.SpinWait.SpinOnceCore(Int32) + 237 [/_/src/libraries/System.Private.CoreLib/src/System/Threading/SpinWait.cs @ 241]
000000C85F37F500 00007FF81A82B33D System.Private.CoreLib.dll!System.Threading.SemaphoreSlim.Wait(Int32, System.Threading.CancellationToken) + 221 [/_/src/libraries/System.Private.CoreLib/src/System/Threading/SemaphoreSlim.cs @ 432]
OS Thread Id: 0x4824
        Child SP               IP Call Site
000000C8624ED798                  [HelperMethodFrame_2OBJ: 000000c8624ed798] System.Diagnostics.Debugger.Log(Int32, System.String, System.String)
000000C8624EDAD0 00007FF81A83E025 System.Private.CoreLib.dll!System.Diagnostics.DebugProvider.WriteCore(System.String) + 133 [/_/src/libraries/System.Private.CoreLib/src/System/Diagnostics/DebugProvider.Windows.cs @ 62]
000000C8624EDB40 00007FF81A83DE66 System.Private.CoreLib.dll!System.Diagnostics.DebugProvider.Write(System.String) + 134 [/_/src/libraries/System.Private.CoreLib/src/System/Diagnostics/DebugProvider.cs @ 64]
000000C8624EDBA0 00007FF81A8681FC Microsoft.Extensions.Logging.Debug.dll!Microsoft.Extensions.Logging.Debug.DebugLogger.Log[[Microsoft.Extensions.Logging.LoggerMessage+LogValues`2[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]], Microsoft.Extensions.Logging.Abstractions]](Microsoft.Extensions.Logging.LogLevel, Microsoft.Extensions.Logging.EventId, LogValues`2<System.__Canon,System.__Canon>, System.Exception, System.Func`3<LogValues`2<System.__Canon,System.__Canon>,System.Exception,System.String>) + 284 [/_/src/libraries/Microsoft.Extensions.Logging.Debug/src/DebugLogger.cs @ 67]
OS Thread Id: 0xd14
        Child SP               IP Call Site
000000C86C80E460                  [InlinedCallFrame: 000000c86c80e460] System.Data.SqlClient.dll!System.Data.SqlClient.SNINativeMethodWrapper.SNIReadSyncOverAsync(System.Data.SqlClient.SNIHandle, IntPtr ByRef, Int32)
000000C86C80E460                  [InlinedCallFrame: 000000c86c80e460] System.Data.SqlClient.dll!System.Data.SqlClient.SNINativeMethodWrapper.SNIReadSyncOverAsync(System.Data.SqlClient.SNIHandle, IntPtr ByRef, Int32)
000000C86C80E430 00007FF818D739D4 Microsoft.AspNetCore.Server.IIS.dll!ILStubClass.IL_STUB_PInvoke(System.Data.SqlClient.SNIHandle, IntPtr ByRef, Int32) + 164

I also find that last one interesting in the light of various deadlocks in the SqlClient:

dotnet/SqlClient#262
dotnet/SqlClient#425

@kouvel
Copy link
Member

kouvel commented Sep 19, 2020

Hmm strange. Could you please share a heap dump of the IISExpress process while VS is in the hung state with TieredCompilation set to false in the project file (in the same feedback ticket)? Just want to see if the setting is effective in the process and if there is maybe a different type of hang also happening.

@kouvel
Copy link
Member

kouvel commented Sep 19, 2020

Is this the SpinWait you referred to?

The spin-wait I was referring to above is in coreclr.dll here:

0:118> kn 10
 # Child-SP          RetAddr           Call Site
00 00000035`5210bd68 00007ff9`af73818e ntdll!ZwDelayExecution+0x14 [minkernel\ntdll\daytona\objfre\amd64\usrstubs.asm @ 595] 
01 00000035`5210bd70 00007ff9`638bdce7 KERNELBASE!SleepEx+0x9e [minkernel\kernelbase\thread.c @ 2425] 
02 (Inline Function) --------`-------- coreclr!ClrSleepEx+0xb [F:\workspace\_work\1\s\src\coreclr\src\vm\hosting.cpp @ 259] 
03 00000035`5210be10 00007ff9`63751e30 coreclr!__SwitchToThread+0x1459df [F:\workspace\_work\1\s\src\coreclr\src\vm\hosting.cpp @ 310] 

I think that is still an issue, but perhaps there are other issues involved, or for some reason the setting to turn off tiering maybe is not working. Hopefully the new heap dump would provide more info.

@joakimriedel
Copy link
Author

@kouvel I made another dump with tiering off.

Unfortunately I cannot seem to edit the feedback ticket that I opened through VS2019.

I'm not sure what kind of details that would be exposed in a dump like this, so I password protected the link. Send me an email or suggest other means of transportation for the password.

https://1drv.ms/u/s!AqY6E4IHKYsBgqkwad1C8WTGtJIz7Q?e=bhcqFB

@kouvel
Copy link
Member

kouvel commented Sep 20, 2020

Thanks @joakimriedel, couldn't find your e-mail, can you e-mail me at kouvel@microsoft.com?

@joakimriedel
Copy link
Author

@kouvel you've got mail.

@kouvel
Copy link
Member

kouvel commented Sep 21, 2020

It looks like the same underlying issue, I forgot that the problematic code path is taken in another case. Could you please try with COMPlus_TieredCompilation=0 and COMPlus_ProfApi_RejitOnAttach=0 in the startup environment variables? I don't think the latter option has a project file setting, so might be easier to use environment variables for both.

@jeffschwMSFT jeffschwMSFT modified the milestones: 5.0.0, 5.0.x Oct 22, 2020
@joakimriedel
Copy link
Author

Any estimate on when this might be patched in a service release?

Debugging experience in .NET Core 5 is very frustrating since it hangs at least once every third hour or so at random places, something that never happened in earlier versions.

Restarting the debugging session is a simple workaround, but every time I lose process state (and time).

@kouvel
Copy link
Member

kouvel commented Nov 5, 2020

This one seems to be a bit more complicated than the other one. We're currently discussing solutions, at the moment I'm reasonably confident that something can be done to resolve the deadlock even if the diagnostics experience is a bit worse when the rare case does happen. Still aiming for the first servicing release.

@kouvel
Copy link
Member

kouvel commented Nov 5, 2020

@jeffschwMSFT do you happen to know when the first servicing release for 5.0 would be released?

@jeffschwMSFT
Copy link
Member

We are taking issues for consideration now for 5.0.1.

@joakimriedel
Copy link
Author

Thanks for investigating @kouvel !

Out of curiosity; from the response to this issue not many others seem to be affected by this bug. Our application is a pretty standard .NET Core solution with MVC and Web API - why do I hit this edge case and not other people with a similar setup?

@kouvel
Copy link
Member

kouvel commented Nov 6, 2020

The timing window is typically short. A thread checks that debugger suspension is not in progress, soon afterwards the debugger asks to suspend the runtime, then the thread may for example trigger a GC from allocating a few bytes when calling a virtual or interface method through a new type for the first time, which is also rare. The timing window may increase if there are more threads to suspend.

kouvel added a commit to kouvel/runtime that referenced this issue Nov 11, 2020
…ding in some cases

1. When suspending for the debugger is in progress (the debugger is waiting for some threads to reach a safe point for suspension), a thread that is not yet suspended may trigger another runtime suspension. This is currently not allowed because the order of operations conflicts with requirements to send GC events for managed data breakpoints to work correctly when suspending for a GC. Instead, the thread suspends for the debugger first, and after the runtime is resumed, continues suspending for GC.
2. At the same time, if the thread that is not suspended yet is in a forbid-suspend-for-debugger region, it cannot suspend for the debugger, which conflicts with the above scenario, but is currently necessary for the issue fixed by dotnet#40060
3. The current plan is to change managed data breakpoints implementation to pin objects instead of using GC events to track object relocation, and to deprecate the GC events APIs
4. With that, the requirement in #1 goes away, so this change conditions the check to avoid suspending the runtime during a pending suspension for the debugger when GC events are not enabled

- Verified that the latest deadlock seen in dotnet#42375 manifests only when a data breakpoint set and not otherwise
- Combined with dotnet#44471 and a VS update to use that to switch to the pinning mechanism, the deadlock issue seen above should disappear completely
@lcrumbling
Copy link

lcrumbling commented Nov 18, 2020

This github issue was difficult to find in Google results. We are experiencing similar hangs, where debugging hangs, and the memory and CPU graphs come to a halt. Experienced it while debugging both a console application, and an asp.net application, both under .NET 5.0, both with EF Core 5.0.

@hoyosjs
Copy link
Member

hoyosjs commented Nov 18, 2020

@lcrumbling this fix will probably be available in 5.0.1 if it's this same issue. It's currently getting ported to servicing. Sorry for the inconvenience.

@marklio marklio added the tenet-reliability Reliability/stability related issue (stress, load problems, etc.) label Nov 24, 2020
@kouvel
Copy link
Member

kouvel commented Nov 25, 2020

Should be fixed by #44563, which is expected to be included in 5.0.1

@kouvel kouvel closed this as completed Nov 25, 2020
@MikaelFerland
Copy link

This issue is fixed in 5.0.100 .NET 5 SDK right? Because something similar occur on my machine.

@hoyosjs
Copy link
Member

hoyosjs commented Dec 8, 2020

@MikaelFerland there's a rare case where it's not fixed for 5.0.100. That fix will be released in 5.0.101. If you can grab a dump using something like procdump -ma <PID>, we could take a look, whough it's most likely the same issue. I'd only expect to see this deadlock to be seen under VS on Windows.

@MikaelFerland
Copy link

@hoyosjs thank you, I got 3 dumps : 1 of VS (devenv.exe) and the 2 others are my processes. If the deadlock is happening in VS you need the devenv.exe dump right?
Also, where can I sent that 2.44 GB file?
Thank you!

@hoyosjs
Copy link
Member

hoyosjs commented Dec 8, 2020

If you have something like OneDrive, probably upload them there and have a link protected to share just with juan.hoyos at microsoft dot com. I'll let you know once I have them. Are your processes x64? (with the exception of devenv)

@MikaelFerland
Copy link

They are x64 processes, you should have received the link.

@hoyosjs
Copy link
Member

hoyosjs commented Dec 8, 2020

@MikaelFerland I tried this and sadly I need the app side to see this. You can stop sharing the OneDrive share. You can upload the files securely on https://developercommunity2.visualstudio.com/t/Debugger-hangs-sporadically-in-Visual-St/1187332

@MikaelFerland
Copy link

@hoyosjs The files have been upload!

@gobananasgo
Copy link

I just encountered the same issue as the above users.
We have updated our project from dotnet 1.0, to 5.0 following the upgrade guide on the website.
https://docs.microsoft.com/en-us/aspnet/core/migration/31-to-50?view=aspnetcore-5.0&tabs=visual-studio

I have another project to upgrade as well. Will be interesting to see if it encounters the same issue, or if it is ok.

@mangod9
Copy link
Member

mangod9 commented Jan 4, 2021

@gobananasgo are you hitting it with .net version 5.0.100?

@gobananasgo
Copy link

@mangod9
I am using 5.0.101

@sossee2
Copy link

sossee2 commented Jan 20, 2021

image
Not sure if exaxtly the same issue but the same symptoms with the exception that this happens reliably as soon as I fire up the hundred or so client websocket connections in my app. App slows to a crawl, Diag tools screen almost stops and many threads are in "System.Private.CoreLib.dll!System.Diagnostics.Debugger.NotifyOfCrossThreadDependencySlow()" as per the screenshot. It started happening following my port from .NET Framework to .NET5.0 (latest GA build)

@kouvel
Copy link
Member

kouvel commented Jan 20, 2021

@sossee2 that sounds like a different issue, could you please open a new issue? It might be useful to have access to a crash dump as well.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-TieredCompilation-coreclr tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Projects
None yet
Development

No branches or pull requests