-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition (also a regression of the PR 19139) #19221
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files
... and 21 files with indirect coverage changes @@ Coverage Diff @@
## main #19221 +/- ##
==========================================
- Coverage 68.82% 68.81% -0.02%
==========================================
Files 420 420
Lines 35649 35664 +15
==========================================
+ Hits 24536 24541 +5
- Misses 9692 9697 +5
- Partials 1421 1426 +5 Continue to review full report in Codecov by Sentry.
|
c76bbeb
to
b1e5ebc
Compare
@fuweid @ivanvc @jmhbnz @serathius This PR fixed a regression caused by #19139. So let's get this merged and backport to 3.5 and probably 3.4. We need to get it included in 3.5.18 |
/test pull-etcd-integration-1-cpu-arm64 |
server/embed/etcd.go
Outdated
@@ -411,6 +411,16 @@ func (e *Etcd) Close() { | |||
close(e.stopc) | |||
}) | |||
|
|||
for i := range e.Clients { | |||
if e.Clients[i] != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Clients here is about net listener. If we close it at the beginning, all the connections will be closed, and it seems like we are unable to drain all the inflight requests gracefully. The stopServers
function is used to gracefully shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Clients here is about net listener. If we close it at the beginning, all the connections will be closed,
It's intentional.
- We should stop accepting any new connections immediately.
- It won't close the already established connections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw you reverts that change. would you mind update the description as well? thanks
When stopping etcd, we should close all listeners and context firstly, afterwards close the etcdserver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description has already been updated. Where did you get the description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the pull request's description
Hard to review without loading a lot of context, it's not the first time we are having problems with shutdown. I think the problem is lack of high level vision on shutdown protocol for server, and what sub routines should do to follow it, and why everything works together. @ahrtr could you add a comment describing the shutdown protocol you have in mind? It should make it easier to review and be useful for the future. |
Done. Please see the last commit. cc @fuweid @ivanvc @jmhbnz @serathius |
/test pull-etcd-integration-1-cpu-arm64 |
…te before it returns Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
… the errc Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahrtr, fuweid The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fix #19172
Please review this PR commit by commit.
Three high level thoughts,
sync.WaitGroup
, we should always callwg.Add
andwg.Wait
in the same goroutine.cc @serathius @fuweid @ivanvc @jmhbnz @joshuazh-x