-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix netns leak on container creation and exit code 1 on SIGTERM. #24082
Conversation
When we are killed during netns setup it will leak the netns path as it was not commited in the db. This is rather common if you run systemctl stop on a podman systemd unit. Of course we cannot protect against SIGKILL but in systemd case we get SIGTERM and we really should not exit in a critical section like this. Fixes containers#24044 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Currently podman run -d can exit 0 if we send SIGTERM during startup even though the contianer was never started. That just doesn't make any sense is horribly confusing for a external job manager like systemd. The original motivation was to exit 0 for the podman.service in commit ca7376b. That does make sense but it should only do so for the service and only if the server did indeed gracefully shutdown. So we rework how the exit logic works, do not let the handler perform the exit. Instead the shutdown package does the exit after all handlers are run, this solves the issue of ordering. Then we default to exit code 1 like we did before and allow the service exit handler to overwrite the exit code 0 in case of a graceful shutdown. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
It is never used and needed so let's just remove some dead code. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@mheon @edsantiago PTAL |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Ephemeral COPR build failed. @containers/packit-build please check. |
LGTM, and I can no longer reproduce the netns leak |
@@ -75,6 +83,7 @@ func Start() error { | |||
} | |||
handlerLock.Unlock() | |||
shutdownInhibit.Unlock() | |||
os.Exit(exitCode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this prevent a lot of defer
functions from running? I know I avoided it deliberately when I wrote this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean yes but if we exit what do we want to defer here? And most importantly the current handler also just exited so there should not be any functional difference I would say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC we have a bunch of things running in defer that do things like removing files to clean up after ourselves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure but this PR doesn't change the behavior here. Critical sections where we cannot leak need to use shutdown.Inhibit()
which is how I fixed the issue with the netns leak.
LGTM, but defereing for merge to @mheon given his questions ... |
Saw the netns leak flake today, almost wept in despair, then remembered that this PR hasn't merged yet. |
/lgtm |
857a47d
into
containers:main
The old code did use exit() there is no functional change in that regard. |
LGTM |
libpod: ensure we are not killed during netns creation
When we are killed during netns setup it will leak the netns path as it
was not commited in the db. This is rather common if you run systemctl
stop on a podman systemd unit. Of course we cannot protect against
SIGKILL but in systemd case we get SIGTERM and we really should not exit
in a critical section like this.
Fixes #24044
libpod: rework shutdown handler flow
Currently podman run -d can exit 0 if we send SIGTERM during startup
even though the contianer was never started. That just doesn't make any
sense is horribly confusing for a external job manager like systemd.
The original motivation was to exit 0 for the podman.service in commit
ca7376b. That does make sense but it should only do so for the
service and only if the server did indeed gracefully shutdown.
So we rework how the exit logic works, do not let the handler perform
the exit. Instead the shutdown package does the exit after all handlers
are run, this solves the issue of ordering. Then we default to exit code
1 like we did before and allow the service exit handler to overwrite the
exit code 0 in case of a graceful shutdown.
libpod: remove shutdown.Unregister()
It is never used and needed so let's just remove some dead code.
Does this PR introduce a user-facing change?