Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman machine does not stop correctly while running a container #22515

Closed
cbr7 opened this issue Apr 26, 2024 · 11 comments · Fixed by #23097
Closed

Podman machine does not stop correctly while running a container #22515

cbr7 opened this issue Apr 26, 2024 · 11 comments · Fixed by #23097
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. machine macos MacOS (OSX) related podman-desktop remote Problem is in podman-remote stale-issue

Comments

@cbr7
Copy link

cbr7 commented Apr 26, 2024

Issue Description

On version 5.0.2 on macOS it seems that it's not possible to correctly stop the podman machine if it has at least an active container running.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Have podman 5.0.2 installed
  2. Create a podman machine.
  3. pull and image and run it as a container.
  4. after container start up try to stop the podman machine
  5. Notice that "Error: failed waiting for vm to stop" error is thrown.
  6. At this point the podman machine is still showing as running in podman machine list but running podman images throws the following error: "Cannot connect to Podman. Please verify your connection to the Linux system using podman system connection list, or try podman machine init and podman machine start to manage a new Linux VM
    Error: unable to connect to Podman socket: failed to connect: ssh: handshake failed: read tcp 127.0.0.1:58659->127.0.0.1:53782: read: connection reset by peer"

Describe the results you received

Error thrown when stopping podman machine

Describe the results you expected

Podman machine successfully stops

podman info output

Error: failed waiting for vm to stop

Error: failed waiting for vm to stopode 125

============================================

vladimirlazar@Vladimirs-MacBook-Pro-2 ~ % podman images
Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM
Error: unable to connect to Podman socket: failed to connect: ssh: handshake failed: read tcp 127.0.0.1:56370->127.0.0.1:53782: read: connection reset by peer

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

vladimirlazar@Vladimirs-MacBook-Pro-2 ~ % podman version
Client: Podman Engine
Version: 5.0.2
API Version: 5.0.2
Go Version: go1.22.2
Git Commit: 3304dd9
Built: Wed Apr 17 21:13:18 2024
OS/Arch: darwin/arm64

Server: Podman Engine
Version: 5.0.2
API Version: 5.0.2
Go Version: go1.21.9
Built: Wed Apr 17 02:00:00 2024
OS/Arch: linux/arm64
vladimirlazar@Vladimirs-MacBook-Pro-2 ~ % clear
vladimirlazar@Vladimirs-MacBook-Pro-2 ~ % podman version
Client: Podman Engine
Version: 5.0.2
API Version: 5.0.2
Go Version: go1.22.2
Git Commit: 3304dd9
Built: Wed Apr 17 21:13:18 2024
OS/Arch: darwin/arm64

Server: Podman Engine
Version: 5.0.2
API Version: 5.0.2
Go Version: go1.21.9
Built: Wed Apr 17 02:00:00 2024
OS/Arch: linux/arm64
vladimirlazar@Vladimirs-MacBook-Pro-2 ~ % podman info
host:
arch: arm64
buildahVersion: 1.35.3
cgroupControllers:

  • cpu
  • io
  • memory
  • pids
    cgroupManager: systemd
    cgroupVersion: v2
    conmon:
    package: conmon-2.1.10-1.fc39.aarch64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: '
    cpuUtilization:
    idlePercent: 97.55
    systemPercent: 1.36
    userPercent: 1.09
    cpus: 6
    databaseBackend: sqlite
    distribution:
    distribution: fedora
    variant: coreos
    version: "39"
    eventLogger: journald
    freeLocks: 2048
    hostname: localhost.localdomain
    idMappings:
    gidmap:
    • container_id: 0
      host_id: 1000
      size: 1
    • container_id: 1
      host_id: 100000
      size: 1000000
      uidmap:
    • container_id: 0
      host_id: 501
      size: 1
    • container_id: 1
      host_id: 100000
      size: 1000000
      kernel: 6.8.4-200.fc39.aarch64
      linkmode: dynamic
      logDriver: journald
      memFree: 12158222336
      memTotal: 12620021760
      networkBackend: netavark
      networkBackendInfo:
      backend: netavark
      dns:
      package: aardvark-dns-1.10.0-1.fc39.aarch64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.10.0
      package: netavark-1.10.3-1.fc39.aarch64
      path: /usr/libexec/podman/netavark
      version: netavark 1.10.3
      ociRuntime:
      name: crun
      package: crun-1.14.4-1.fc39.aarch64
      path: /usr/bin/crun
      version: |-
      crun version 1.14.4
      commit: a220ca661ce078f2c37b38c92e66cf66c012d9c1
      rundir: /run/user/501/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
      os: linux
      pasta:
      executable: /usr/bin/pasta
      package: passt-0^20240405.g954589b-1.fc39.aarch64
      version: |
      pasta 0^20240405.g954589b-1.fc39.aarch64-pasta
      Copyright Red Hat
      GNU General Public License, version 2 or later
      https://www.gnu.org/licenses/old-licenses/gpl-2.0.html
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
      remoteSocket:
      exists: true
      path: /run/user/501/podman/podman.sock
      security:
      apparmorEnabled: false
      capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
      rootless: true
      seccompEnabled: true
      seccompProfilePath: /usr/share/containers/seccomp.json
      selinuxEnabled: true
      serviceIsRemote: true
      slirp4netns:
      executable: /usr/bin/slirp4netns
      package: slirp4netns-1.2.2-1.fc39.aarch64
      version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
      swapFree: 0
      swapTotal: 0
      uptime: 0h 1m 32.00s
      variant: v8
      plugins:
      authorization: null
      log:
  • k8s-file
  • none
  • passthrough
  • journald
    network:
  • bridge
  • macvlan
  • ipvlan
    volume:
  • local
    registries:
    search:
  • docker.io
    store:
    configFile: /var/home/core/.config/containers/storage.conf
    containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
    graphDriverName: overlay
    graphOptions: {}
    graphRoot: /var/home/core/.local/share/containers/storage
    graphRootAllocated: 99252940800
    graphRootUsed: 3804274688
    graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
    imageCopyTmpDir: /var/tmp
    imageStore:
    number: 0
    runRoot: /run/user/501/containers
    transientStore: false
    volumePath: /var/home/core/.local/share/containers/storage/volumes
    version:
    APIVersion: 5.0.2
    Built: 1713312000
    BuiltTime: Wed Apr 17 02:00:00 2024
    GitCommit: ""
    GoVersion: go1.21.9
    Os: linux
    OsArch: linux/arm64
    Version: 5.0.2

Additional information

Seems to happen consistently on macOS, but was not able to reproduce on Windows 11.

@cbr7 cbr7 added the kind/bug Categorizes issue or PR as related to a bug. label Apr 26, 2024
@github-actions github-actions bot added macos MacOS (OSX) related remote Problem is in podman-remote labels Apr 26, 2024
@benoitf
Copy link
Contributor

benoitf commented Apr 26, 2024

@cbr7 could you add the image you're using / pulling /running

@cbr7
Copy link
Author

cbr7 commented Apr 26, 2024

@benoitf I was able to reproduce the issue with the image ghcr.io/linuxcontainers/alpine:latest.

@benoitf
Copy link
Contributor

benoitf commented Apr 26, 2024

$ podman machine start
$ podman run --rm -it fedora

another terminal:

podman machine stop

then it's delayed by 1mn30
image

@Luap99 Luap99 added the machine label Apr 26, 2024
@Luap99
Copy link
Member

Luap99 commented Apr 26, 2024

From some internal discussion:

  1. podman machine stop should wait longer (at least 90 seconds) as shutdown can be delayed for many reason.
  2. Investigate a better way to stop containers when they don't react to sigterm (the default podman timeout is 10s) so we should likely not rely on systemd to stop it and wait 90s.

@mheon
Copy link
Member

mheon commented Apr 26, 2024

For podman machine possibly investigate reducing the 90s systemd timeout as well? When I want the VM down, I want it down quickly, and it's unlikely that containers in a machine VM are production-critical - early SIGKILL shouldn't hurt that much.

Copy link

A friendly reminder that this issue had no activity for 30 days.

@odockal
Copy link

odockal commented Jun 25, 2024

Any update on the issue?

@Luap99
Copy link
Member

Luap99 commented Jun 25, 2024

Yes for 2, #23064 fixes the long stop systemd timeout issue when the container does not exit on sigterm.

For 1 I can open a PR to increase the timeout. I guess at some point (maybe after 90s) we should terminate the VM forcefully and print a warning. I don't think machine stop should ever return an error if the shutdown takes to long.

@Luap99
Copy link
Member

Luap99 commented Jun 25, 2024

Feel free to test if #23097 works for you

@odockal
Copy link

odockal commented Jun 26, 2024

@Luap99 Thanks! @cbr7 Can you take a look, please?

@cbr7
Copy link
Author

cbr7 commented Jun 26, 2024

@odockal sure

mheon pushed a commit to mheon/libpod that referenced this issue Jul 10, 2024
The current timeout was not long enough. Systemd default is 90s so we
should wait for at least that long. Also it really doesn't make sense to
throw an error we saying we failed waiting for stop. We should hard
terminate the VM in case a graceful shutdown did not happen.

Fixes containers#22515

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 25, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Sep 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. machine macos MacOS (OSX) related podman-desktop remote Problem is in podman-remote stale-issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants