-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/6.0] Fix thread-safety issues with enumerating ResourceManager. #81281
Conversation
Concurrently enumerating a ResourceManager while also calling GetString() and similar methods was prone to both transient errors and deadlock. The transient errors were caused by RuntimeResourceSet calling internal methods on ResourceReader that did not properly lock. Now, all exposed methods on the Reader are thread-safe. The deadlock was called by inconsistent lock ordering between ResourceReader.ResourceEnumerator and RuntimeResourceSet which both lock on the RuntimeResourceSet's cache and on the ResourceReader itself. Now, the enumerator does not need to take both locks at the same time. Fix #74052 Fix #74868
Tagging subscribers to this area: @dotnet/area-system-resources Issue DetailsBackport of #75054 to release/6.0 Customer ImpactTestingRiskIMPORTANT: Is this backport for a servicing release? If so and this change touches code that ships in a NuGet package, please make certain that you have added any necessary package authoring and gotten it explicitly reviewed.
|
There is one failure that is related: /~https://github.com/dotnet/runtime/pull/81281/checks?check_run_id=10937934301 which is related to intermittent timeout, we need to add the fix for that |
…numeration (#80330) This raises the timeout to 30s, the same as what we have for the equivalent ResourceManager test (/~https://github.com/dotnet/runtime/blob/15fcb990fe17348ab6ddde0939200b900169920b/src/libraries/System.Resources.ResourceManager/tests/ResourceManagerTests.cs#L255). fix #80277
This is approved, contingent upon partner signoff that the fix resolves their issue. Remove the |
@ericstj Am I the "partner" in this case? If so, are there instructions for running my test code against this backport branch? |
Not really :), there is a partner that reported they have blocked by similar issue while migrating from 3.1 to 6.0, they need to confirm the fix |
@buyaa-n please ping me when this is ready to merge. Can you please confirm that the CI failures are unrelated to this change? If the logs are gone, please close and reopen the PR to rebase on top of the latest bits in the 6.0 branch, because I merged a few infra fixes. And please get a code review sign-off. |
I just merged some additional helix queue fixes causing arm* failures. We can close/reopen to get cleaner CI. |
As we did not get the partner signoff and it is the last day for 6.0.15 release code compete this fix not gonna happen in 6.0.15 release. Though we still think the regression should be fixed in 6.0/7.0, so we will conduct some more testing especially in perf side and ask for porting in next release without customer sign off. Will not close the PR but removing the approval and changing the milestone. |
@buyaa-n which release will contain this fix? |
@madelson this will go into the April 2023 Servicing Release. The GA date is April 11th: https://dev.azure.com/devdiv/DevDiv/_wiki/wikis/DevDiv.wiki/33408/April-2023 |
Backport of #75054 to release/6.0
/cc @buyaa-n @madelson
Customer Impact
The regression introduced when the scope of the lock reduced from reader to cache and as @madelson mentioned the deadlock happening because:
RuntimeResourceSet
, we lock thecache
first and then lock theResourceReader
(from within method called within that lock).runtime/src/libraries/System.Private.CoreLib/src/System/Resources/RuntimeResourceSet.cs
Line 286 in 7db1c33
ResourceReader.ResourceEnumerator
we lock thereader
first and then thecache
:runtime/src/libraries/System.Private.CoreLib/src/System/Resources/ResourceReader.cs
Lines 1082 to 1084 in 7db1c33
A customer who is migrating from 3.1 to 6.0 is blocked by a deadlock issue that is a regression from 3.1 => 6.0. This fixed in .NET 8.0 but as .NET 3.1 is reached end of life more customers might be migrating from .NET 3.1 to 6.0 or 7.0. and encounter this issue.
There is a workaround to use a new
ResourceManager
instance when planning to call GetResourceSet in different threads rather than re-using a shared instance, but not desirable.Testing
Risk
Low, the fix is removing the outer lock from
ResourceReader.ResourceEnumerator
and instead adding a granular lock for each method that use the shared_store
and misses the lock in theResourceReader
(previously some methods had lock, some not, for which the caller putting a lock to the reader). Also, it adds some Debug.Asserts to make the locking acquired appropriately and corresponding unit test that reproes the issue.