-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Func-eval while runtime is suspended may not run due to deadlock (timing issue) #1537
Comments
- In many cases cooperative GC mode is entered after acquiring the slot backpatching lock and the thread may block for debugger suspension while holding the lock. A FuncEval may time out on entering the lock if for example it calls a virtual or interface method for the first time. Failing the FuncEval when the lock is held enables the debugger to fall back to other options for expression evaluation. - Also added polls for debugger suspension before acquiring the slot backpatching lock on background threads that often operate in preemptive GC mode. A common case is when the debugger breaks while the tiering delay timer is active, the timer ticks shortly afterwards (after debugger suspension completes) and if a thread pool thread is already available, the background thread would block while holding the lock. The poll checks for debugger suspension and pulses the GC mode to block before acquiring the lock. - The fix is only a heuristic and lessens the problem when it is detected that the lock is held by some thread. Since the lock is acquired in preemptive GC mode, it is still possible that after the check at the start of a FuncEval, another thread acquires the lock and the FuncEval may time out. The polling makes it less likely for the lock to be taken by background tiering work, for example if a FuncEval starts while rejitting a method. - The expression evaluation experience may be worse when it is detected that the lock is held, and may still happen from unfortunate timing Fix for dotnet#1537
- In many cases cooperative GC mode is entered after acquiring the slot backpatching lock and the thread may block for debugger suspension while holding the lock. A FuncEval may time out on entering the lock if for example it calls a virtual or interface method for the first time. Failing the FuncEval when the lock is held enables the debugger to fall back to other options for expression evaluation. - Also added polls for debugger suspension before acquiring the slot backpatching lock on background threads that often operate in preemptive GC mode. A common case is when the debugger breaks while the tiering delay timer is active, the timer ticks shortly afterwards (after debugger suspension completes) and if a thread pool thread is already available, the background thread would block while holding the lock. The poll checks for debugger suspension and pulses the GC mode to block before acquiring the lock. Risks: - The fix is only a heuristic and lessens the problem when it is detected that the lock is held by some thread. Since the lock is acquired in preemptive GC mode, it is still possible that after the check at the start of a FuncEval, another thread acquires the lock and the FuncEval may time out. The polling makes it less likely for the lock to be taken by background tiering work, for example if a FuncEval starts while rejitting a method. - The expression evaluation experience may be worse when it is detected that the lock is held, and may still happen from unfortunate timing - Low risk for the change itself Port of dotnet/runtime#2380 Fix for dotnet/runtime#1537
- In many cases cooperative GC mode is entered after acquiring the slot backpatching lock and the thread may block for debugger suspension while holding the lock. A FuncEval may time out on entering the lock if for example it calls a virtual or interface method for the first time. Failing the FuncEval when the lock is held enables the debugger to fall back to other options for expression evaluation. - Also added polls for debugger suspension before acquiring the slot backpatching lock on background threads that often operate in preemptive GC mode. A common case is when the debugger breaks while the tiering delay timer is active, the timer ticks shortly afterwards (after debugger suspension completes) and if a thread pool thread is already available, the background thread would block while holding the lock. The poll checks for debugger suspension and pulses the GC mode to block before acquiring the lock. - The fix is only a heuristic and lessens the problem when it is detected that the lock is held by some thread. Since the lock is acquired in preemptive GC mode, it is still possible that after the check at the start of a FuncEval, another thread acquires the lock and the FuncEval may time out. The polling makes it less likely for the lock to be taken by background tiering work, for example if a FuncEval starts while rejitting a method. - The expression evaluation experience may be worse when it is detected that the lock is held, and may still happen from unfortunate timing Fix for #1537
…28006) - In many cases cooperative GC mode is entered after acquiring the slot backpatching lock and the thread may block for debugger suspension while holding the lock. A FuncEval may time out on entering the lock if for example it calls a virtual or interface method for the first time. Failing the FuncEval when the lock is held enables the debugger to fall back to other options for expression evaluation. - Also added polls for debugger suspension before acquiring the slot backpatching lock on background threads that often operate in preemptive GC mode. A common case is when the debugger breaks while the tiering delay timer is active, the timer ticks shortly afterwards (after debugger suspension completes) and if a thread pool thread is already available, the background thread would block while holding the lock. The poll checks for debugger suspension and pulses the GC mode to block before acquiring the lock. Risks: - The fix is only a heuristic and lessens the problem when it is detected that the lock is held by some thread. Since the lock is acquired in preemptive GC mode, it is still possible that after the check at the start of a FuncEval, another thread acquires the lock and the FuncEval may time out. The polling makes it less likely for the lock to be taken by background tiering work, for example if a FuncEval starts while rejitting a method. - The expression evaluation experience may be worse when it is detected that the lock is held, and may still happen from unfortunate timing - Low risk for the change itself Port of dotnet/runtime#2380 Fix for dotnet/runtime#1537
If this is the "To prevent an unsafe abort" root cause, is there a known workaround at this time? |
Disabling tiered compilation should help when debugging, although it can't be disabled if attaching to a running app that already has it enabled. Tiered compilation may be disabled in the following ways: Project file (with a clean build after changes to the property):
.runtimeconfig.json
Environment variable:
|
Fixed in 3.1 with dotnet/coreclr#28006, and fixed in master with #32250 (original PR #2380) |
One thread may be doing:
TieredCompilationManager::TieringDelayTimerCallbackWorker
TieredCompilationManager::ResumeCountingCalls
MethodDesc::BackpatchToResetEntryPointSlots
GCCoop::GCCoop
After runtime suspension, the func-eval thread may be doing:
MethodDesc::DoPrestub
CodeVersionManager::PublishVersionableCodeIfNecessary
MethodDescBackpatchInfoTracker::ConditionalLockHolder::ConditionalLockHolder
It cannot acquire the lock and the func-eval times out. There doesn't seem to be any other instance of notifying the debugger of a cross-thread dependency (though something like
NotifyOfCrossThreadDependency
) from the VM side. It still wouldn't allow the func-eval to run anyway.Best way to fix probably would be to not enter cooperative GC mode while holding a regular lock, such that the runtime cannot be suspended while holding the lock. This would mean entering cooperative GC mode before acquiring the lock, requiring that the lock be an unsafe any-mode lock, which in turn means that no
GC_TRIGGERS
work may be done inside the lock. May be doable, need to see what it would entail.The text was updated successfully, but these errors were encountered: