Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker network namespace ethernet interfaces race condition #20067

Open
irsl opened this issue Feb 6, 2016 · 4 comments
Open

docker network namespace ethernet interfaces race condition #20067

irsl opened this issue Feb 6, 2016 · 4 comments
Labels
area/networking kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. version/1.10

Comments

@irsl
Copy link

irsl commented Feb 6, 2016

I started compiling this bug report while being on Docker 1.9. Then I realized 1.10 is out, so I upgraded. Then I found the problem persisted (and got even more undeterministic).

docker version

Client:
Version: 1.10.0
API version: 1.22
Go version: go1.5.3
Git commit: 590d510
Built: Thu Feb 4 18:16:19 2016
OS/Arch: linux/amd64

Server:
Version: 1.10.0
API version: 1.22
Go version: go1.5.3
Git commit: 590d510
Built: Thu Feb 4 18:16:19 2016
OS/Arch: linux/amd64

docker info

Containers: 20
Running: 10
Paused: 0
Stopped: 10
Images: 309
Server Version: 1.10.0
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 538
Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
Volume: local
Network: bridge host null
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 2.985 GiB
Name: builder
ID: DJQ3:G446:H6DX:GRRV:YTTX:EKUF:HUB2:UFUO:MFVA:G3EB:XPOU:PIAM
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support

I noticed that network interfaces in the network stack of containers sometimes are not listed in the same order, although the containers were started by the same commands in the same order.
In case source ip is not bound explicitly by applications, the kernel binds the address of the first listed interface, not the eth0's one. This in turn affects my firewall configuration.

Expected order of interfaces:
lo
eth0: main_network (the one used with run or create)
eth1: other_network (connected by docker network command)

With Docker 1.9

WIth Docker 1.9 I used the following commands:

docker run --name=container --net=main_network ...
docker network connect other_network container

I managed to reproduce the problem by repeating the above run/connect commands in a simple for loop. The order of the network interfaces was the "expected" about 98-99 times in 100 attempts.

An example when something was messed up:
512: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:16:00:68 brd ff:ff:ff:ff:ff:ff
inet 172.22.0.104/16 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe16:68/64 scope link
valid_lft forever preferred_lft forever
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
510: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:12:00:67 brd ff:ff:ff:ff:ff:ff
inet 172.18.0.103/16 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe12:67/64 scope link
valid_lft forever preferred_lft forever

With Docker 1.10

WIth Docker 1.10 I used the following commands:

docker create --name=container --net=main_network ...
docker network connect other_network container
docker start container

By running these commands in a loop, the order of the network interfaces became more diverse (than it was with 1.9).

An example when the interfaces were not in the expected order:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
878: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:12:00:68 brd ff:ff:ff:ff:ff:ff
inet 172.18.0.104/16 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe12:68/64 scope link
valid_lft forever preferred_lft forever
880: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:16:00:69 brd ff:ff:ff:ff:ff:ff
inet 172.22.0.105/16 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe16:69/64 scope link
valid_lft forever preferred_lft forever

@mavenugo
Copy link
Contributor

mavenugo commented Feb 6, 2016

@irsl I cannot comment on the "order" of the ip link output as it is beyond the scope of docker. But I can reason out the interface creation process.

in docker 1.9, one couldn't connect to a network until it was previously started. And hence the only network that the container is attached to was the network represented by --net. Later, when the container is connected to the new network, new endpoints were created and attached. Hence it can be guaranteed that container's eth0 will be the endpoint of the network connected first, followed by eth1 and so forth. But, if the container is restarted, docker doesnt provide any guarantees on the order of the eth interfaces as above.

Now with 1.10, none of the above has changed. the only difference is that, the daemon allows the container to be connected to multiple networks before it is started. Hence the same non-guarantee as in 1.9 is applicable to this case as well.

Since you are using docker create in 1.10 vs docker run (as in 1.9), the above non-guarantee is more proned to be seen. If you switch back to docker run in v1.10 followed by connects, it will work just the same way as in 1.9.

Again, as far as the order of ip link output, I dont know if the above explanation is useful.

@irsl
Copy link
Author

irsl commented Feb 7, 2016

Thank you for your response.

I believe Docker should offer a reproducable way to specify the desired network outgoing wider world connections of multihomed containers would be routed through.

I propose:
The docker network connect command would offer a new (optional) option --metric with default value of 0.
Docker would configure a default gateway for each network a container is connected to, with respect of the metric specified.

What do you think?

@thaJeztah thaJeztah added kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. area/networking labels Feb 16, 2016
@wnagele
Copy link
Contributor

wnagele commented Feb 24, 2017

We have some daemons that require a strict binding to an Ethernet interface. Given the dynamic nature of these we could either add logic into the containers to figure out when an interface becomes available and which is the right one (based on IP assignment).

We opted instead to create a patch that allows the interface prefix (normally eth) to be varied on a network level. This will result in a network interface name prefix+num whereas num is still an increasing dynamic number. In our case a container is only connected to any given network once so this will always remain 0 and works for us. A completely static interface is only possible if the user prevents collisions manually - which is contrary to the current approach for this configuration.

The current patch (Based on release 1.12.3) is available here: /~https://github.com/unwired/docker/commit/0418707e53c9c2275f91fd141c30ceadab2fb227
If the Docker team (@mavenugo) wants I would be happy to provide an upstream pull request for this. Feedback is always very welcome.

@thaJeztah
Copy link
Member

@wnagele looks like all changes are in the libnetwork code, so if you want to open a PR, it'll have to go into /~https://github.com/docker/libnetwork. Since you already have the code ready, it's worth opening it as a pull request for discussion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. version/1.10
Projects
None yet
Development

No branches or pull requests

5 participants