Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrency/mutexmap Move Unlock to after operation #101

Merged
merged 1 commit into from
Sep 9, 2024

Conversation

JoshVanL
Copy link
Contributor

Description

Please explain the changes you've made

Issue reference

We strive to have all PR being opened based on an issue, where the problem or feature have been discussed prior to implementation.

Please reference the issue this PR will close: #[issue number]

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

  • Code compiles correctly
  • Created/updated tests

Signed-off-by: joshvanl <me@joshvanl.dev>
@JoshVanL JoshVanL requested review from a team as code owners August 19, 2024 15:58
Copy link
Member

@artursouza artursouza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide more context of why this change is needed? The previous design separated the lock of the map with the lock of each mutex.

@artursouza
Copy link
Member

Can we have a unit test that runs multiple operations in parallel to try a deadlock scenario? I don't disagree that there is a bug but I am concerned about this PR as well for bundling the 2 locks instead of keeping them separate.

@elena-kolevska
Copy link
Contributor

LGTM, but I agree we should extend the existing concurrency tests with the delete unlocks.

@JoshVanL
Copy link
Contributor Author

Because it is a true race condition of LOC and execution, I'm not sure of a way to write a sensible unit test for it.

@artursouza
Copy link
Member

It is hard to make a deterministic test for this. Is it possible to make a test that will not cause false positives? Meaning that the race condition will most likely cause the test to fail but not 100% guaranteed. On the other hand, not having the race condition will make the test pass 100%. This way, we can run the test a few times to make sure. It is not ideal but better than visual inspection (aka code review) IMO.

Is there another layer (runtime, maybe) where it can be tested?

@JoshVanL
Copy link
Contributor Author

JoshVanL commented Sep 2, 2024

I can look into doing that. This bug is currently being manifested in Dapr integration tests failing.

Copy link
Member

@artursouza artursouza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved under the assumption that integration tests will show the fix.

@JoshVanL
Copy link
Contributor Author

JoshVanL commented Sep 9, 2024

Here is an example of a dapr/dapr int test surfacing a variation of this bug fatal error: sync: RUnlock of unlocked RWMutex.

/~https://github.com/dapr/dapr/actions/runs/10780003020/job/29895148587?pr=8066

@artursouza artursouza merged commit 502671b into dapr:main Sep 9, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants