Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup is not set: internal libpod error after os reboot. #19175

Closed
pjannesen opened this issue Jul 9, 2023 · 9 comments · Fixed by #19189 or #19888
Closed

cgroup is not set: internal libpod error after os reboot. #19175

pjannesen opened this issue Jul 9, 2023 · 9 comments · Fixed by #19189 or #19888
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@pjannesen
Copy link
Contributor

Issue Description

I have a problem with podman on Alpine Linux. Everything seems to work fine until the operating system reboots. After a reboot I get 'cgroup is not set: internal libpod error'. These errors can be resolved by deleting and recreating the pod.

Steps to reproduce the issue

podman pod create test
podman run --pod=test hello-world
reboot
podman run --pod=test hello-world

Describe the results you received

Error: pod e827e57ccc13b5b556493540f2669b424a07cc1bec63993473414dcc14c9a656 cgroup is not set: internal libpod error

Describe the results you expected

Output of hello-world

podman info output

host:
  arch: amd64
  buildahVersion: 1.30.0
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.7-r1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: unknown'
  cpuUtilization:
    idlePercent: 99.97
    systemPercent: 0.02
    userPercent: 0.01
  cpus: 12
  databaseBackend: boltdb
  distribution:
    distribution: alpine
    version: 3.18.2
  eventLogger: file
  hostname: localhost
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.1.38-0-lts
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 206336000
  memTotal: 518397952
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.8.4-r0
    path: /usr/bin/crun
    version: |-
      crun version 1.8.4
      commit: 5a8fa99a5e41facba2eda4af12fa26313918805b
      rundir: /run/crun
      spec: 1.0.0
      +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-r0
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 1155526656
  swapTotal: 1155526656
  uptime: 0h 16m 6.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 0
    stopped: 2
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 6944530432
  graphRootUsed: 270413824
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 2
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.5.1
  Built: 1688368964
  BuiltTime: Mon Jul  3 07:22:44 2023
  GitCommit: ""
  GoVersion: go1.20.5
  Os: linux
  OsArch: linux/amd64
  Version: 4.5.1

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

Alpine linux 3.18.2, podman v4.5.1 (I also tested version 3.17 same result)

Additional information

No response

@pjannesen pjannesen added the kind/bug Categorizes issue or PR as related to a bug. label Jul 9, 2023
@pjannesen
Copy link
Contributor Author

Comparing the debug logs between fedora (systemd) and alpine (no systemd) it seems that de cgroups ar not restored after a reboot.

Fedora:

EBU[0000] Podman detected system restart - performing state refresh
DEBU[0000] Created cgroup path machine.slice/machine-libpod_pod_5038bb5d816f6e4520979e9bebe29dace692144d66f84adfe4a6dd063630f872.slice for parent machine.slice and name libpod_pod_5038bb5d816f6e4520979e9bebe29dace692144d66f84adfe4a6dd063630f872
DEBU[0000] Created cgroup machine.slice/machine-libpod_pod_5038bb5d816f6e4520979e9bebe29dace692144d66f84adfe4a6dd063630f872.slice
INFO[0000] Setting parallel job count to 37

Alpine:

DEBU[0000] Podman detected system restart - performing state refresh
INFO[0000] Setting parallel job count to 25

@rhatdan
Copy link
Member

rhatdan commented Jul 10, 2023

@mheon @giuseppe PTAL

@pjannesen
Copy link
Contributor Author

I applied the following patch and recompiled podman and it works now. I don't know if this patch is 100% correct (it's my first time with go and I don't know anything about the internals of podman).

 --- a/libpod/pod_internal_linux.go
 +++ b/libpod/pod_internal_linux.go
 @@ -21,7 +21,7 @@ func (p *Pod) platformRefresh() error {
             }
             p.state.CgroupPath = cgroupPath
         case config.CgroupfsCgroupsManager:
 -           if rootless.IsRootless() && isRootlessCgroupSet(p.config.CgroupParent) {
 +           if !rootless.IsRootless() || isRootlessCgroupSet(p.config.CgroupParent) {
                 p.state.CgroupPath = filepath.Join(p.config.CgroupParent, p.ID())
 
                 logrus.Debugf("setting pod cgroup to %s", p.state.CgroupPath)

@mheon
Copy link
Member

mheon commented Jul 10, 2023

I think that fixes the bug right here, but it looks like this might be showing a more serious issue. Pod cgroups are not being refreshed properly after a reboot, so all resource limits set at the pod level are being lost after the system reboots. Need more testing to fully confirm.

That patch definitely fixes a serious issue, though (inability to create containers after reboot), so feel free to submit it - we have Podman 4.6 upcoming and I'd love to have this fixed there.

pjannesen added a commit to pjannesen/podman that referenced this issue Jul 10, 2023
[NO NEW TESTS NEEDED]
Closes containers#19175

Signed-off-by: Peter Jannesen <peter@jannesen.com>
@pjannesen
Copy link
Contributor Author

The pull request checks have failed on random unrelated error. How to proceed?

@mheon
Copy link
Member

mheon commented Jul 11, 2023

Reopening to track cgroup-reboot issue

@mheon mheon reopened this Jul 11, 2023
ashley-cui pushed a commit to ashley-cui/podman that referenced this issue Jul 13, 2023
[NO NEW TESTS NEEDED]
Closes containers#19175

Signed-off-by: Peter Jannesen <peter@jannesen.com>
@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Aug 11, 2023

@giuseppe PTAL

@giuseppe
Copy link
Member

giuseppe commented Sep 7, 2023

turned out to be a much bigger issue, opened a PR: #19888

giuseppe added a commit to giuseppe/libpod that referenced this issue Sep 8, 2023
When a container is created and it is part of a pod, we ensure the pod
cgroup exists so limits can be applied on the pod cgroup.

Closes: containers#19175

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Dec 11, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
4 participants