Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libpod: setupNetNS() correctly mount netns #24024

Merged
merged 2 commits into from
Sep 20, 2024

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Sep 20, 2024

The netns dir has a special logic to bind mout itself and make itslef
shared. This code here didn't which lead to catastrophic bug during
netns unmounting as we were unable to unmount the netns as the mount got
duplicated and had the wrong parent mount. This caused us to loop forever
trying to remove the file.

Fixes https://issues.redhat.com/browse/RHEL-59620
Fixes #23685

Does this PR introduce a user-facing change?

Fixed a bug that made it impossible to unmount our netns on cleanup which causes our commands to hang.

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Sep 20, 2024
@giuseppe
Copy link
Member

I think we need to modify check_netns_files to ignore the mount point itself

@Luap99
Copy link
Member Author

Luap99 commented Sep 20, 2024

I think we need to modify check_netns_files to ignore the mount point itself

I don't understand what you are trying to say with that.

@giuseppe
Copy link
Member

I think we need to modify check_netns_files to ignore the mount point itself

I don't understand what you are trying to say with that.

I was looking at the failing test:

[+0814s] # --- /tmp/CI_lOcy/bats-run-jyH39Y/suite/netns-pre	2024-09-20 11:50:25.729240197 +0000
[+0814s] # +++ /tmp/CI_lOcy/bats-run-jyH39Y/suite/netns-post	2024-09-20 12:03:37.699435052 +0000
[+0814s] # @@ -0,0 +1 @@
[+0814s] # +netns-66ea18a9-066d-f924-bf03-f3fed67b8660
[+0814s] # 
[+0814s] # ^^^^^ Leaks found in /run/netns ^^^^^
[+0814s] # bats warning: Executed 333 instead of expected 332 tests
[+0814s] make: *** [Makefile:691: localsystem] Error 1
[12:03:37] END - [+0814s] total duration since 2024-09-20T12:03:37Z

it seems related to this change, and kind of makes sense since we are adding a new mount on /run/netns now, which wasn't done before in the Podman code

@Luap99
Copy link
Member Author

Luap99 commented Sep 20, 2024

I don't think the mount matters here in any way for this check the file is visible with or without the mount. And we already had the mount setup until the first ctr was one with --userns so it should not have changed much.

The error above clearly complains about netns-66ea18a9-066d-f924-bf03-f3fed67b8660 which is a ctr netns file that we didn't remove (unfortunately there is no way to know which test leaked it as it is just a random ID)

In general the leak check was just added two days ago, #2399, so I would not be surprised if this is a pre-existing issue that just causes some flakes. I at least cannot see anything wrong on the netns creating part that would cause new leaks

@Luap99
Copy link
Member Author

Luap99 commented Sep 20, 2024

Test passed on rerun I keep an eye out on the flakes with the netns cleanup.

To include the pkg/netns changes.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The netns dir has a special logic to bind mout itself and make itslef
shared. This code here didn't which lead to catastrophic bug during
netns unmounting as we were unable to unmount the netns as the mount got
duplicated and had the wrong parent mount. This caused us to loop forever
trying to remove the file.

Fixes https://issues.redhat.com/browse/RHEL-59620
Fixes containers#23685

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@Luap99 Luap99 marked this pull request as ready for review September 20, 2024 13:20
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 20, 2024
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

openshift-ci bot commented Sep 20, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mheon
Copy link
Member

mheon commented Sep 20, 2024

LGTM

@mheon
Copy link
Member

mheon commented Sep 20, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 20, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 2f44b16 into containers:main Sep 20, 2024
88 checks passed
@Luap99 Luap99 deleted the netns-dir branch September 20, 2024 14:50
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Dec 20, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Dec 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: podman stop: timeout
3 participants