Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container with podman network not receiving UDP traffic #1045

Open
booleanvariable opened this issue Aug 1, 2024 · 6 comments
Open

Container with podman network not receiving UDP traffic #1045

booleanvariable opened this issue Aug 1, 2024 · 6 comments
Labels

Comments

@booleanvariable
Copy link

Issue Description

Upon running a simple python server container listening on a UDP socket with an attached podman network, UDP traffic that is being sent to the port does not arrive.

Versions 5.2.0-dev-5d10f77da and 4.9.4-rhel both were tried with the same results.

This is a MRE of the issue we are having in production. Docker is fine, podman+cni is fine, podman+netavark exhibits this issue. Note restarting our UDP devices or changing the source port is very cumbersome and we wish to avoid this.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Create Dockerfile
FROM python:latest
WORKDIR /usr/local/bin
COPY server.py . 
CMD ["chmod", "+x", "server.py"]
CMD ["server.py"]
  1. The corresponding server script
#!/bin/python3
import socket

server_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
server_socket.bind(('', 17000))

while True:
    message, address = server_socket.recvfrom(1024)
    print(f"resceived from: {address}: {message}", flush = True)
  1. podman build . -t podman_udp_test
  2. podman network create podman_udp
  3. Start sending UDP traffic to port 17000 etc with nping: nping -g 17580 -p 17000 -c 1000000 --udp 127.0.0.1
  4. podman run -p 17000:17000/udp --net podman_udp_network podman_udp_testing

Describe the results you received

No output

Describe the results you expected

Output from the server after receiving packets

podman info output

host:
  arch: amd64
  buildahVersion: 1.33.8
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: fb8c4bf50dbc044a338137871b096eea8041a1fa'
  cpuUtilization:
    idlePercent: 99.38
    systemPercent: 0.28
    userPercent: 0.35
  cpus: 4
  databaseBackend: sqlite
  distribution:
    distribution: rhel
    version: "9.4"
  eventLogger: journald
  freeLocks: 2032
  hostname: ccms-pod
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.14.0-427.18.1.el9_4.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 640634880
  memTotal: 8058433536
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.10.0-3.el9_4.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.10.0
    package: netavark-1.10.3-1.el9.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: crun-1.14.3-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.14.3
      commit: 1961d211ba98f532ea52d2e80f4c20359f241a98
      rundir: /run/user/0/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    exists: false
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.3-1.el9.x86_64
    version: |-
      slirp4netns version 1.2.3
      commit: c22fde291bb35b354e6ca44d13be181c76a0a432
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 5367644160
  swapTotal: 5368705024
  uptime: 583h 32m 27.00s (Approximately 24.29 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 5
    paused: 0
    running: 5
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 47173337088
  graphRootUsed: 20552769536
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 34
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.9.4-rhel
  Built: 1719829634
  BuiltTime: Mon Jul  1 18:27:14 2024
  GitCommit: ""
  GoVersion: go1.21.11 (Red Hat 1.21.11-1.el9_4)
  Os: linux
  OsArch: linux/amd64
  Version: 4.9.4-rhel

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

There was no difference from running nping on localhost versus running it on a different machine that can access the podman container

Additional information

Starting the python server first and then starting the UDP sender works as expected but this doesn't help our use case.

Stopping and restarting the UDP sender program while the container is running doesn't help. Only by changing the source port of the UDP sender program does traffic start being received, but we cannot easily change the source port of the UDP traffic.

@Luap99
Copy link
Member

Luap99 commented Aug 2, 2024

This is likely because we do not change any conntack entries in netavark. We must call into the kernel netlink API to drop the stale entries and last I check our netlink did not have any support for conntack types so we would need to implement the types from scratch which is a lot of work.
In any case this is a netavark issue so I move it there.

Note if you are RHEL user it is best to report this through the Red Hat support channels so this can get better prioritized.

@Luap99 Luap99 removed the network label Aug 2, 2024
@Luap99 Luap99 transferred this issue from containers/podman Aug 2, 2024
@booleanvariable
Copy link
Author

Is there a work around possible?

@Luap99
Copy link
Member

Luap99 commented Aug 5, 2024

manually clear conntrack entries (assuming that is actually causing the issue you are having)

@woodsb02
Copy link

I am having this same issue after restarting a pod that uses quadlets (systemctl restart app-pod.service).

I was able to work around it by manually clearing the conntrack entries as suggested.

conntrack -L conntrack  | grep 514
conntrack -D conntrack --proto udp --orig-src 192.168.20.1 --orig-dst 192.168.20.2 --sport 514 --dport 5141

Any way I can help troubleshoot why this is happening, and help fix it, so this work around isn't required?

@booleanvariable
Copy link
Author

I am having this same issue after restarting a pod that uses quadlets (systemctl restart app-pod.service).

I was able to work around it by manually clearing the conntrack entries as suggested.

conntrack -L conntrack  | grep 514
conntrack -D conntrack --proto udp --orig-src 192.168.20.1 --orig-dst 192.168.20.2 --sport 514 --dport 5141

Any way I can help troubleshoot why this is happening, and help fix it, so this work around isn't required?

Manually clearing the conntrack entries was an acceptable workaround for us in production. Otherwise I cannot offer any further information about this. Sorry

@Luap99
Copy link
Member

Luap99 commented Jan 6, 2025

Any way I can help troubleshoot why this is happening, and help fix it, so this work around isn't required?

It happens because the kernel keeps the conntrack kernel around for a while. Not sure on the time but it is not important.

What needs to happen is for netavark to learn how to flush these entries on setup/teardown. And this requires us to talk to the proper kernel APIs like conntrack does. Calling the conntrack command from netavark would not seem acceptable to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants