-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement reference counting of volume mounts in amazon-ecs-volume-plugin #4425
Conversation
94c668a
to
0652e36
Compare
(bold italic emphasis above mine) Is there any case where mount ID would NOT have a mount count of 1 for existing mounts in old state files? |
Since older state files do not have this information, we cannot predict the correct count for a mount ID. The current plugin has no concept of mount counts. That being said, the only case I know where a mount count goes over 1 is when So, our best bet is to initialize mount count for mounts tracked by older plugins to 1. It should be correct for a big majority of cases since we haven't received too many issues due to this bug. |
Summary
ECS Agent uses amazon-ecs-volume-plugin for mounting and unmounting EFS volumes on ECS container instances. The plugin implements Docker volume plugins protocol. It supports operations for creating/removing a volume (called when
docker volume create
anddocker volume rm
or equivalent happen) and mounting/unmounting clients to a volume (happens during container create/stop and other events such asdocker cp
). Each mount/unmount request has a unique ID for the mount.Currently, the plugin assumes that it will only get one mount and unmount request for a mount ID, however this is not true and we have seen
docker cp
command reusing the same mount ID as container start/stop. Docker's expectation is for the plugin to keep track of number of mounts for a mount ID, that is, it should increment the number of mounts for mount requests and decrement for unmount requests. Due to this mismatch in expectations, volume plugin currently ignores a mount request with a mount ID it is already tracking and ends up unmounting the volume when it gets a corresponding unmount request for the mount ID. So, adocker cp
operation on the volume on the container can cause the plugin to unmount the volume from the host prematurely.This PR fixes this issue by updating the plugin to keep a track of number of mounts for a mount ID.
Related to moby/moby#34665
Implementation details
Volume.Mounts
field frommap[string]*string
(a set) tomap[string]int
. This field keeps track of mounts on a volume and I am changing its type so that it can keep track of number of mounts for a mount ID.Volume.AddMount
andVolume.Unmount
methods accordingly so that they increment and decrement the number of mounts for a mount ID.VolumeInfo.Mounts
field accordingly. This type is used to persist and load state of a volume from disk.AmazonECSVolumePlugin.LoadState
method so that it resets anyVolumInfo.Mounts
zero values to one. This is to make this change backwards-compatible with old state files that do not have mount counts. Old state files havenull
against mount IDs and JSON decoding convertsnull
s to0
by default. The method updates0
s to1
so that existing mounts in old state files start with a mount count of 1 instead of 0. Typically a mount ID would always have a mount count of 1 for most cases, so this change should help all such users to upgrade the plugin version safely.Testing
I ran a task with a container that is configured to have an EFS volume mounted and verified that
docker cp
no longer causes premature unmounting of the EFS volume from the container instance. I also monitored the state file and verified that it has counts for mount IDs now.Also verified that the updated plugin is able to read old state files and convert
null
against mount IDs to a mount count of1
.Also ran a task with two containers and verified that the plugin is able to keep track of mount counts for both containers correctly in the state file. Each container gets its own mount ID and
docker cp
on one container does not affect the other. Also verified that stopping one container does not cause the volume to unmount from the container instance and stopping the second container afterwards does cause the volume to be unmounted from the container instance as expected.New tests cover the changes: yes
Description for the changelog
bugfix: Implement reference counting of volume mounts in amazon-ecs-volume-plugin
Additional Information
Does this PR include breaking model changes? If so, Have you added transformation functions?
Does this PR include the addition of new environment variables in the README?
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.