-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSI driver causes EFS filesystems to mount to wrong mountpoint #282
Comments
Want to make sure I understand: in the first example, instead of seeing upload003/upload003 that Pod saw one of upload003/upload00{0,1,2,4,5,6,7}? ls: cannot access '/upload003/upload003': No such file or directory Could you share logs of the CSI driver on the node to which such a Pod was scheduled? This script can help gather logs. /~https://github.com/kubernetes-sigs/aws-efs-csi-driver/tree/master/troubleshooting . If you are worried about sensitive info or quantity of logs, feel free to send it over email or slack. |
Okay I just sent you the logs via email. Your description of the issue is correct, /upload003 actually saw /upload003/upload005, so it was mounted to one of the other volumes. We verified this by both checking for the same number of files in both mount points (/upload003 and /upload005) and by verifying a file with a UUID filename existed in both mount points. |
We are seeing a similar issue as well. We do ~10 deployment daily and each pod has 2 EFS mounts. In the past month, we have seen the mount order reversed atleast 3 times. Its really hard to re-produce. We have updated the AMIs on all the EKS clusters to see if it resolves the issue. Will post logs next time we hit the issue. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind bug
We use a sharded EFS setup in which multiple EFS filesystems are mounted to one pod. Files are distributed among the shards randomly so that we don't reach an I/O bottleneck in any one filesystem. We have found that randomly some pods will come up with a different EFS share mounted than what the pod spec specifies. To check for this we created a file on each filesystem corresponding to its correct mount point, so the filesystem that should be mounted at
/upload001
contains a file calledupload001
, so if we do als /upload001/upload001
the file should exist. What we found was that occasionally that file does not exist and a different filesystem is indeed mounted in a place it shouldn't be. See the below printout:For the second pod all the shards list the test file as expected, in the first they do not. Note that these pods both have identical specs, as they are created by the same deployment. This never happened before we switched to using the CSI driver and we noticed it only because we were getting random 404 errors in our nginx deployment.
I am not 100% sure how to reproduce this. We have seen it in multiple deployments (nginx and gunicorn) and we are mapping about 17 filesystems to various mount points in each pod. My guess is if you create a similar scenario and delete pods and check them in the way outlined above you will see that sometimes the mapping fails.
I just want to make clear this is a serious issue. One of our cronjobs periodically garbage collects files from our EFS shards; any file that does not have a corresponding database object containing the given EFS path is deleted. If we had been unlucky and this occurred in the pod that was performing that cleanup, it would have deleted every file in the filesystem that was mounted improperly.
kubectl version
): 1.17.9-eks-4c6976The text was updated successfully, but these errors were encountered: