Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/6.0] Fix thread-safety issues with enumerating ResourceManager. #81281

Merged
merged 9 commits into from
Mar 10, 2023

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Jan 27, 2023

Backport of #75054 to release/6.0

/cc @buyaa-n @madelson

Customer Impact

The regression introduced when the scope of the lock reduced from reader to cache and as @madelson mentioned the deadlock happening because:

A customer who is migrating from 3.1 to 6.0 is blocked by a deadlock issue that is a regression from 3.1 => 6.0. This fixed in .NET 8.0 but as .NET 3.1 is reached end of life more customers might be migrating from .NET 3.1 to 6.0 or 7.0. and encounter this issue.

There is a workaround to use a new ResourceManager instance when planning to call GetResourceSet in different threads rather than re-using a shared instance, but not desirable.

Testing

  • Unit test added with the fix
  • Manually tested the repro provided in the issue with the bits produced by this PR

Risk

Low, the fix is removing the outer lock from ResourceReader.ResourceEnumerator and instead adding a granular lock for each method that use the shared _store and misses the lock in the ResourceReader (previously some methods had lock, some not, for which the caller putting a lock to the reader). Also, it adds some Debug.Asserts to make the locking acquired appropriately and corresponding unit test that reproes the issue.

Concurrently enumerating a ResourceManager while also calling GetString()
and similar methods was prone to both transient errors and deadlock.

The transient errors were caused by RuntimeResourceSet calling internal
methods on ResourceReader that did not properly lock. Now, all exposed
methods on the Reader are thread-safe.

The deadlock was called by inconsistent lock ordering between
ResourceReader.ResourceEnumerator and RuntimeResourceSet which both lock
on the RuntimeResourceSet's cache and on the ResourceReader itself. Now,
the enumerator does not need to take both locks at the same time.

Fix #74052
Fix #74868
@ghost
Copy link

ghost commented Jan 27, 2023

Tagging subscribers to this area: @dotnet/area-system-resources
See info in area-owners.md if you want to be subscribed.

Issue Details

Backport of #75054 to release/6.0

/cc @buyaa-n @madelson

Customer Impact

Testing

Risk

IMPORTANT: Is this backport for a servicing release? If so and this change touches code that ships in a NuGet package, please make certain that you have added any necessary package authoring and gotten it explicitly reviewed.

Author: github-actions[bot]
Assignees: -
Labels:

area-System.Resources

Milestone: -

@buyaa-n buyaa-n added this to the 6.0.x milestone Jan 31, 2023
@buyaa-n buyaa-n added the Servicing-consider Issue for next servicing release review label Jan 31, 2023
@buyaa-n buyaa-n removed the Servicing-consider Issue for next servicing release review label Jan 31, 2023
@ericstj ericstj added the Servicing-consider Issue for next servicing release review label Jan 31, 2023
@buyaa-n
Copy link
Contributor

buyaa-n commented Jan 31, 2023

There is one failure that is related: /~https://github.com/dotnet/runtime/pull/81281/checks?check_run_id=10937934301 which is related to intermittent timeout, we need to add the fix for that

@rbhanda rbhanda added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Jan 31, 2023
@rbhanda rbhanda modified the milestones: 6.0.x, 6.0.15 Jan 31, 2023
@buyaa-n buyaa-n added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Jan 31, 2023
@ericstj
Copy link
Member

ericstj commented Jan 31, 2023

This is approved, contingent upon partner signoff that the fix resolves their issue. Remove the NO-MERGE once we have signoff from the impacted customer.

@madelson
Copy link
Contributor

madelson commented Feb 1, 2023

This is approved, contingent upon partner signoff that the fix resolves their issue

@ericstj Am I the "partner" in this case? If so, are there instructions for running my test code against this backport branch?

@buyaa-n
Copy link
Contributor

buyaa-n commented Feb 6, 2023

Am I the "partner" in this case?

Not really :), there is a partner that reported they have blocked by similar issue while migrating from 3.1 to 6.0, they need to confirm the fix

@carlossanlop
Copy link
Member

carlossanlop commented Feb 8, 2023

@buyaa-n please ping me when this is ready to merge.

Can you please confirm that the CI failures are unrelated to this change? If the logs are gone, please close and reopen the PR to rebase on top of the latest bits in the 6.0 branch, because I merged a few infra fixes.

And please get a code review sign-off.

@carlossanlop
Copy link
Member

I just merged some additional helix queue fixes causing arm* failures. We can close/reopen to get cleaner CI.

@carlossanlop carlossanlop reopened this Feb 9, 2023
@buyaa-n
Copy link
Contributor

buyaa-n commented Feb 13, 2023

This is approved, contingent upon partner signoff that the fix resolves their issue. Remove the NO-MERGE once we have signoff from the impacted customer.

As we did not get the partner signoff and it is the last day for 6.0.15 release code compete this fix not gonna happen in 6.0.15 release.

Though we still think the regression should be fixed in 6.0/7.0, so we will conduct some more testing especially in perf side and ask for porting in next release without customer sign off. Will not close the PR but removing the approval and changing the milestone.

@buyaa-n buyaa-n removed the Servicing-approved Approved for servicing release label Feb 13, 2023
@buyaa-n buyaa-n modified the milestones: 6.0.15, 6.0.x Feb 13, 2023
@ericstj ericstj added Servicing-consider Issue for next servicing release review and removed NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) labels Mar 9, 2023
@leecow leecow added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Mar 9, 2023
@leecow leecow modified the milestones: 6.0.x, 6.0.16 Mar 9, 2023
@madelson
Copy link
Contributor

@buyaa-n which release will contain this fix?

@carlossanlop
Copy link
Member

@madelson this will go into the April 2023 Servicing Release. The GA date is April 11th: https://dev.azure.com/devdiv/DevDiv/_wiki/wikis/DevDiv.wiki/33408/April-2023

@ghost ghost locked as resolved and limited conversation to collaborators Apr 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Resources Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants