-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OverlayFS support (d_revalidate out and support renameat2 flags) #9414
Conversation
Btw, my branch with changes needed to get Docker working (and xattrs working in 1st level userns) can be found here: |
Have you tried mounting this with |
Hmm should have let the tests finish before publishing the PR as open for comments :) Now I have to figure out why is zfs_link_create playing the -ENOSYS note. |
c2e3c19
to
aba3a50
Compare
Can I help myself and restart the tests somehow, please? Or is it up to the project leadership to come by and restart them? Also, I'm trying running tests locally and even without my patches still, like > 150 tests are failing - is this normal? I see that some of them are documented to fail, but I still get a really long list of those without any annotation (like, acl tests fail? hm...). |
|
1e054d6
to
d09d14b
Compare
So I think this will be a last force-push if the tests finish up fine; I had a creds memory leak in there and an ASSERT-wrapped code before, which caused some space leak and associated entertaining cleanup afterwards on the staging nodes... :) That is if I'm right in assuming all that's needed to fix the i386 build test is to restart it... is it so? |
9ef43df
to
ccd1137
Compare
Are there plans to move this forward? |
for (int i = 0; i < 16; i++) { | ||
int r = 0xFF; | ||
random_get_pseudo_bytes((void *)&r, 1); | ||
pos += snprintf(tmpname+pos, MAXPATHLEN, "%02x", r); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed this should be MAXPATHLEN-pos
otherwise you could overrun the buffer.
I'll try to pick this one up again next week (the authorship of this PR is getting kinda funky now...). |
@snajpa I'm going to squash some of these patches together and add a |
Ah I forgot that #9600 hasn't been fully resolved. I guess I'll need to tackle that first (or we can just make the PR a "renameat2 flags" PR and we can deal with revalidation separately). |
Curious if there's been any progress on this or the required #9600 offline. Would be great to have drop-in support for replacing legacy docker setups with ZFS underpinnings but without rebuilding the storage graph. |
@sempervictus I've carried the renameat2 changes to #12209 but there is an issue with a necessary change to the integration tests which is being reviewed in #12588. However, it seems I will need to switch back to new TX types because the current mechanism of emulating the replay of the new features through other TX types is actually not crash-resistant during replay. As for #9600 I haven't had time to look and see what is necessary to implement the new test harness which was discussed in the relevant review PR (the d_revalidate change was merged but bugs were found so it was reverted). |
@cyphar - thank you very much for explaining that, a bit more complex than i'd thought. We're surviving on AUFS for the time being, but would be cool if at some point we could just use OS defaults for Docker. |
@sempervictus: Slightly OT, but may I ask whether you still stick to AUFS and/or compared it/switched to fuse-overlayfs in the meantime given that the former is unfortunately only supported by newer LTS kernels 5.10, 5.15 according to http://aufs.sourceforge.net/ which forced me to look for an alternative for the OEM kernel 5.14 on Ubuntu 20.04 ("Focal") with ZFS on root? |
Keeps workloads in the kernel, and have used AUFS since the SLAX days of old... works fine on 5.15. |
Closing, support for overlayfs was merged. |
It seems that ZFS has gained for overlayfs in openzfs/zfs#9414 Furthermore, such a change should not be mixed in with the cgroups changes.
Sorry to bring this one up again. Was this change complete ? If yes, which version of ZFS I need ? On Proxmox VE running on ZFS version zfs-2.1.12-pve1. Trying a docker pull of some image is failing with:
And kernel message get display:
Si either I'm not running the right version of it's still not working ? |
Yes, overlayfs works fine on zfs 2.2 and above. |
@BtbN Thanks ! I will wait for zfs v2.2 to be available and retest. Thanks |
From my personal notes on this subject: Upgraded to the Proxmox 6.5 kernel that includes OpenZFS 2.2 (instructions) The
warning is gone when creating an overlayfs on a ZFS dataset / inside a Proxmox LXC, using
The warning about
To my understanding, this warning is annoying but can be ignored for the typical use case of running overlayfs (namely, Docker) on top of a single ZFS pool:
I'd appreciate commentary on my analysis about the |
Description
This PR is comprised of several commits, the original motivation is to add the
functionality needed to enable OverlayFS on top of ZFS, which namely means
supporting renameat2() Linux syscall with its flags
(most interesting of which is the atomic swap of files with RENAME_EXCHANGE)
and getting rid of
d_revalidate
function, which makes ZFS appear as a remotefilesystem.
Bulk of the original work for supporting the renameat2() flags was done by
@cyphar; he proposes new IL record types, which I'd like as a best solution
as well - but I'm of the opinion that until more platforms adopt renameat2()
semantics in some way, it's not worth the trouble and I'm trying the compatible,
yet more ugly approach of emulating the otherwise atomic operations with
multiple non-atomic ones.
See the commit messages for more information on the relevant parts, I've kept
the code for new txtypes in the original @cyphar's commits.
Fixes #2256
Fixes #8648
Fixes #8774
How Has This Been Tested?
Tested with vanilla packaged Docker from Docker's repo, however, current Docker
checks for ZFS's magic and then refuses to even consider the overlay2 driver.
So I've been testing with a little cheat patch - vpsfreecz@6c7873e
The patches have been deployed to @vpsfreecz staging nodes, which revealed some bugs, that have been addressed in the later force-pushes.
Also tested with the @cyphar's renameat2 util and some parallel script-fu.
Types of changes
Checklist:
Signed-off-by
.