-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libpod: wait for healthy on main thread #22658
libpod: wait for healthy on main thread #22658
Conversation
LGTM |
LGTM aside from the unused-arg thing. Do note that there's another |
154924d
to
8395122
Compare
thanks, I've fixed the other occurrences and hopefully added enough new comments |
Ephemeral COPR build failed. @containers/packit-build please check. |
8395122
to
1e9e448
Compare
1e9e448
to
2006ee1
Compare
Cockpit tests failed for commit 2006ee127641f21739952af7272d398094706706. @martinpitt, @jelly, @mvollmer please check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<+015ms> # # podman run --name 2HlcfDHskG --health-cmd=touch /terminate --sdnotify=healthy quay.io/libpod/testimage:20240123 sh -c while test \! -e /terminate; do sleep 0.1; done; echo finished
<+0120s> # finished
# timeout: sending signal TERM to command ‘/var/tmp/go/src/github.com/containers/podman/bin/podman’
<+005ms> # [ rc=124 (** EXPECTED 0 **) ]
# *** TIMED OUT ***
New test is flaking?! I don't see anything obvious wrong but we should not merge known flaky test can you check again
I have a slight suspicion that this might not be the fault of the new test; that it might be something in the new images. This happens often when we switch VM images: something subtle breaks. The failure you report is in f40 aarch64. You'll notice that the container DID THE ECHO. That is, as best I can tell, the healthcheck and shell loop and everything worked. Container exit (infrastructure) failed. Then there's the rerun:
I think something is broken in aarch64-land. |
Sure, except that the first two runs both failed on this exact test: I don't care what fault it is we should never merge known flakes, period. |
2006ee1
to
1ce0a98
Compare
wait for the healthy status on the thread where the container lock is held. Otherwise, if it is performed from a go routine, a different thread is used (since the runtime.LockOSThread() call doesn't have any effect), causing pthread_mutex_unlock() to fail with EPERM. Closes: containers#22651 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
1ce0a98
to
c08d4a1
Compare
I've stressed the command locally and I've noticed it hangs sometimes, so perhaps it just happens more easily on aarch64. I've added a patch to address the hang I've seen |
Ephemeral COPR build failed. @containers/packit-build please check. |
Cockpit tests failed for commit c08d4a1fb7f213f4509dc11112ca5ca581dac803. @martinpitt, @jelly, @mvollmer please check. |
Cockpit tests failed for commit 1ce0a9886dec45c6f81c26b8c73c141aa72e3e6e. @martinpitt, @jelly, @mvollmer please check. |
c08d4a1
to
3576c9f
Compare
Ephemeral COPR build failed. @containers/packit-build please check. |
Cockpit tests failed for commit 3576c9f20d080136dc08391e9fb0cab95d57e61d. @martinpitt, @jelly, @mvollmer please check. |
do not wait for the healthcheck status to change if the container is stopped. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
3576c9f
to
35375e0
Compare
Ephemeral COPR build failed. @containers/packit-build please check. |
Cockpit tests failed for commit 35375e0. @martinpitt, @jelly, @mvollmer please check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@giuseppe What is the status? Good to remove the draft or do you plan more changes?
LGTM for reference
yes sorry, forgot to remove the Draft |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe, Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
f7a3046
into
containers:main
wait for the healthy status on the thread where the container lock is held. Otherwise, if it is performed from a go routine, a different thread is used (since the runtime.LockOSThread() call doesn't have any effect), causing pthread_mutex_unlock() to fail with EPERM.
Closes: #22651
Does this PR introduce a user-facing change?