Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/system: podman play --service-container slog to syslog #23450

Closed
wants to merge 2 commits into from

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Jul 30, 2024

In theory when syslog is set the cleanup process should log its errors to syslog (journald) so we can have a look at the errors in CI. Without it podman container cleanup errors will never be logged anywhere.

In order to rey to debug #21569

Does this PR introduce a user-facing change?


In theory when syslog is set the cleanup process should log its errors
to syslog (journald) so we can have a look at the errors in CI.
Without it podman container cleanup errors will never be logged anywhere.

In order to rey to debug  containers#21569

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 30, 2024
Copy link
Contributor

openshift-ci bot commented Jul 30, 2024

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot added the do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None label Jul 30, 2024
Copy link
Contributor

openshift-ci bot commented Jul 30, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 30, 2024
When using service containers and play kube we create a complicated set
of dependencies.

First in a pod all conmon/container cgroups are part of one slice, that
slice will be removed when the entire pod is stopped resulting in
systemd killing all processes that were part in it.

Now the issue here is around the working of stopPodIfNeeded() and
stopIfOnlyInfraRemains(), once a container is cleaned up it will check
if the pod should be stopped depending on the pod ExitPolicy. If this is
the case it wil stop all containers in that pod. However in our flaky
test we calle podman pod kill which logically killed all containers
already. Thus the logic now thinks on cleanup it must stop the pod and
calls into pod.stopWithTimeout(). Then there we try to stop but because
all containers are already stopped it just throws errors and never gets
to the point were it would call Cleanup(). So the code does not do
cleanup and eventually calls removePodCgroup() which will cause all
conmon and other podman cleanup processes of this pod to be killed.

Thus the podman container cleanup process was likely killed while
actually trying to the the proper cleanup which leaves us in a bad
state.

Following commands such as podman pod rm will try to the cleanup again
as they see it was not completed but then fail as they are unable to
recover from the partial cleanup state.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@Luap99 Luap99 closed this Jul 31, 2024
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 30, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Oct 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant