Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement reference counting of volume mounts in amazon-ecs-volume-plugin #4425

Merged
merged 4 commits into from
Nov 13, 2024

Conversation

amogh09
Copy link
Contributor

@amogh09 amogh09 commented Nov 12, 2024

Summary

ECS Agent uses amazon-ecs-volume-plugin for mounting and unmounting EFS volumes on ECS container instances. The plugin implements Docker volume plugins protocol. It supports operations for creating/removing a volume (called when docker volume create and docker volume rm or equivalent happen) and mounting/unmounting clients to a volume (happens during container create/stop and other events such as docker cp). Each mount/unmount request has a unique ID for the mount.

Currently, the plugin assumes that it will only get one mount and unmount request for a mount ID, however this is not true and we have seen docker cp command reusing the same mount ID as container start/stop. Docker's expectation is for the plugin to keep track of number of mounts for a mount ID, that is, it should increment the number of mounts for mount requests and decrement for unmount requests. Due to this mismatch in expectations, volume plugin currently ignores a mount request with a mount ID it is already tracking and ends up unmounting the volume when it gets a corresponding unmount request for the mount ID. So, a docker cp operation on the volume on the container can cause the plugin to unmount the volume from the host prematurely.

This PR fixes this issue by updating the plugin to keep a track of number of mounts for a mount ID.

Related to moby/moby#34665

Implementation details

  1. Update the type of Volume.Mounts field from map[string]*string (a set) to map[string]int. This field keeps track of mounts on a volume and I am changing its type so that it can keep track of number of mounts for a mount ID.
  2. Update Volume.AddMount and Volume.Unmount methods accordingly so that they increment and decrement the number of mounts for a mount ID.
  3. Update VolumeInfo.Mounts field accordingly. This type is used to persist and load state of a volume from disk.
  4. Update AmazonECSVolumePlugin.LoadState method so that it resets any VolumInfo.Mounts zero values to one. This is to make this change backwards-compatible with old state files that do not have mount counts. Old state files have null against mount IDs and JSON decoding converts nulls to 0 by default. The method updates 0s to 1 so that existing mounts in old state files start with a mount count of 1 instead of 0. Typically a mount ID would always have a mount count of 1 for most cases, so this change should help all such users to upgrade the plugin version safely.

Testing

I ran a task with a container that is configured to have an EFS volume mounted and verified that docker cp no longer causes premature unmounting of the EFS volume from the container instance. I also monitored the state file and verified that it has counts for mount IDs now.

Also verified that the updated plugin is able to read old state files and convert null against mount IDs to a mount count of 1.

Also ran a task with two containers and verified that the plugin is able to keep track of mount counts for both containers correctly in the state file. Each container gets its own mount ID and docker cp on one container does not affect the other. Also verified that stopping one container does not cause the volume to unmount from the container instance and stopping the second container afterwards does cause the volume to be unmounted from the container instance as expected.

New tests cover the changes: yes

Description for the changelog

bugfix: Implement reference counting of volume mounts in amazon-ecs-volume-plugin

Additional Information

Does this PR include breaking model changes? If so, Have you added transformation functions?

Does this PR include the addition of new environment variables in the README?

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@amogh09 amogh09 changed the title Implement reference counting of volume mounts in amazon-ecs-volume-pl… Implement reference counting of volume mounts in amazon-ecs-volume-plugin Nov 12, 2024
@amogh09 amogh09 force-pushed the volume-reference-counting branch from 94c668a to 0652e36 Compare November 12, 2024 21:26
@amogh09 amogh09 marked this pull request as ready for review November 12, 2024 21:30
@amogh09 amogh09 requested a review from a team as a code owner November 12, 2024 21:30
mye956
mye956 previously approved these changes Nov 12, 2024
@danehlim
Copy link
Contributor

Typically a mount ID would always have a mount count of 1 for most cases

(bold italic emphasis above mine)

Is there any case where mount ID would NOT have a mount count of 1 for existing mounts in old state files?

@amogh09
Copy link
Contributor Author

amogh09 commented Nov 13, 2024

Typically a mount ID would always have a mount count of 1 for most cases

(bold italic emphasis above mine)

Is there any case where mount ID would NOT have a mount count of 1 for existing mounts in old state files?

Since older state files do not have this information, we cannot predict the correct count for a mount ID. The current plugin has no concept of mount counts.

That being said, the only case I know where a mount count goes over 1 is when docker cp is used to copy files to/from a mounted volume in a running container. However, the count reverts back to 1 as soon as docker cp is done. Apparently docker cp does a mount/unmount sequence because it works for stopped containers also which are not guaranteed to have the volume still mounted so docker asks the plugin to mount it again just for docker cp and it reuses the same mount ID as that for container start/stop.

So, our best bet is to initialize mount count for mounts tracked by older plugins to 1. It should be correct for a big majority of cases since we haven't received too many issues due to this bug.

@amogh09 amogh09 merged commit 9494fb3 into aws:dev Nov 13, 2024
40 checks passed
JoseVillalta pushed a commit to JoseVillalta/amazon-ecs-agent that referenced this pull request Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants