Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel command line parameter kernelCommandLine=systemd.unified_cgroup_hierarchy=1 results in creation of cgroup V1 and V2 hierarchy. It is prohibited now, #6662

Open
PavelSosin-320 opened this issue Mar 10, 2021 · 68 comments
Labels

Comments

@PavelSosin-320
Copy link

Environment

Microsoft Windows [Version 10.0.21327.1010]
(c) Microsoft Corporation. All rights reserved.
Fedora 34 self-installed but works in Ubuntu in the same way. This the Kernel issue.
WSL2
WSL Kernel Linux MSI-wsl 5.4.91-microsoft-standard-WSL2

Steps to reproduce

  1. Install systemd based distro like Ubuntu, fedora33remix,
  2. Edit .wslconfig and add systemd.unified_cgroup_hierarchy=1
  3. Start and attach to the running distro using WT
  4. do ls /sys/fs/cgroup/ - both /sys/fs/cgroup/systemd (V1 hierarchy) and /sys/fs/unifued (V2 hierarchy) are presented. /sys/fs/cgroup/ are polluted with cgroup controllers The systemd.unified_cgroup_hierarchy=1 is missinterpreted.
  5. Install any recent OCI runtime version: RunC, CRun), Docker 20.10 daemon, Podman 3
  6. Do Podman, .. info - Unified cgroup hierarchy is not recognized and cgroup V1 is shown due to /cgroup/systemd presence.

Only cgroup V2 hierarchy is built because the "mixed" setup has been prohibited as a dead-end. The recent runC ( Docker 20.10) and cRun switched to support cgroup V2 . It is necessary for rootless user mode, so important for WSL users.
The conversion between mixed mode and cgroup V2 is not supported anymore because of mentioned above reasons.

WSL logs:

Expected behavior

Only cgroup V2 hierarchy is created: /sys/fs/cgroup/unified/ and all controllers are put into the correct place.

Actual behavior

/sys/fs/cgroup is polluted with the random content like controllers and systemd folder

ls /sys/fs/cgroup
blkio cpu,cpuacct cpuset freezer memory net_cls,net_prio perf_event rdma unified
cpu cpuacct devices hugetlb net_cls net_prio pids systemd

Please, correct to allow upgrade Docker and Podman to the recent releases and working as a rootless user. This is also a security issue because WSL root user has unlimited access to the Windows program Files and program Data directories, i.e. can inject any malicious executive into Windows and run it as MyVirus.exe .

@WSLUser
Copy link

WSLUser commented Mar 10, 2021

WSL root user has unlimited access to the Windows program Files and program Data directories, i.e. can inject any malicious executive into Windows and run it as MyVirus.exe

Not true at all. The WSL root user has the same access as a normal Windows user. Go ahead and navigate to C:\Windows\System32 and try replacing one of the executables from within WSL2. It will fail.

@benhillis
Copy link
Member

WSL does not use systemd so that setting is not being respected.

@PavelSosin-320
Copy link
Author

@benhillis WSL Kernel may not support systemd, It is separate module that can be supported by 3th party software. But WSL Kernel doesn't create correcty neither cgroup V1 nor V2 and fails with:
[ 1.666152] cgroup1: Need name or subsystem set
[ 1.675386] ERROR: Mount:2486: mount((null), /sys/fs/cgroup/memory, cgroup, 0x20000e, memory
[ 1.675389] ) failed 22

And distros never reaches the state running. The cost is further networking issues.

@WSLUser
Copy link

WSLUser commented Apr 2, 2021

If you enable systemd yourself through something like genie and set it up to boot with that running first, do you still experience the issue?

@PavelSosin-320
Copy link
Author

@WSLUser After all upgrades to the Kernel 5.10.16.3-microsoft-standard-WSL2 and genie The issue has not going to be solved. Cgroup management and system.d are tightly coupled and the kernel parameter is called systemd.unified_cgroup_hierarchy by the Linux kernel authors. If WSL Kernel doesn't support systemd by itself then I assume that parameter must be called simply unified_cgroup_hierarchy and results in the creation of only the unified group hierarchy without polluting other FS. Unfortunately, it doesn't work. I'm afraid that the entire property kernelCommandLine of wslconfig file is ignored. I see in the of the same
Command line: initrd=\initrd.img panic=-1 nr_cpus=2 swiotlb=force console=ttyS0,115200 debug pty.legacy_count=0
regardless of how I pass unified_cgroup_hierarchy, with or without systemd, with or without quotes, etc.

@WSLUser
Copy link

WSLUser commented May 6, 2021

Ok, I'm going to ask you to do a couple things. First, set up systemd using /~https://github.com/shayne/wsl2-hacks and modify from script improvements shown in shayne/wsl2-hacks#7. Then compile the 5.10 WSL2 kernel using microsoft/WSL2-Linux-Kernel#245 for the config. Use https://wsl.dev/wsl2-kernel-zfs/ for steps. Once you restart your distro, do you still experience the original issue (docker unable to use cgroupsv2)?

@benhillis
Copy link
Member

WSL doesn't run system do so expecting any of the systemd options to be honored will not work.

@PavelSosin-320
Copy link
Author

I'm working in a different context: running the latest released Podman version on top of Fedora 34 remix distro built by Whitewater Foundation. Systemd functionality is provided by systemd-genie and works very reliable including cgroup management, user management, session management for both root and rootless users. I'm testing both scenarios side-by-side because Podman provides almost equal functionality with some minor restrictions for rootless users mainly in the networking and volume binding. I achieved almost full feature parity in both modes with one exception: When Podman detects cgroup V1 hierarchy in the rootless modes it falls back to cgroupfs because systemd doesn't allow mixed, back-compatibility mode and the systemd version used in Fedora34 doesn't contain a convertor. The systemd version 226 uses unified hierarchy by default.
Both Ubuntu and Fedora, Docker and Podman current releases use unified hierarchy and cgroup V1 hierarchy is simply unexpected and missleads Podman. According to error messages that I see in the ConsoleLog it is also not created correctly: attempts to create symbolic links for controllers result in errors:
�[0;1;31mFailed to create symlink /sys/fs/cgroup/cpuacct: File exists�[0m
�[0;1;31mFailed to create symlink /sys/fs/cgroup/cpu: File exists�[0m
�[0;1;31mFailed to create symlink /sys/fs/cgroup/net_prio: File exists�[0m
�[0;1;31mFailed to create symlink /sys/fs/cgroup/net_cls: File exists�[0m
These is very specific bug in the WSL Kernel cgroup implementation. I don't believe that rebuild of the kernel without bug correction. I'm wondering to see that the person that wrote this code for WSL today working on systemd.

@cerebrate
Copy link

Your problem isn't with systemd not seeing the option, because it is passed through: your problem is that systemd isn't the first (and can't be, because of the above-all-distros namespace) init, so by the time systemd gets its hands on it, cgroups have long been initialized.

More specifically, if you set the kernel command-line option cgroup_no_v1=all to try and force it by disabling the controllers for cgroups v1, the following happens at the end of boot:

[    4.424900] Run /init as init process
[    4.431087]   with arguments:
[    4.436243]     /init
[    4.440956]   with environment:
[    4.446506]     HOME=/
[    4.450957]     TERM=linux
[    4.457219] cgroup: Need name or subsystem set
[    4.464697] ERROR: Mount:2513: mount((null), /sys/fs/cgroup/memory, cgroup, 0x20000e, memory
[    4.464700] ) failed 22
[    4.483575] kvm: exiting hardware virtualization
[    4.493039] ACPI: Preparing to enter system sleep state S5
[    4.502509] reboot: Power down
[    4.527725] acpi_power_off called

i.e., it looks like the Microsoft init is making use of v1 memory cgroups, so it doesn't look like you can get to a unified cgroups v2-only hierarchy unless and until that changes.

@PavelSosin-320
Copy link
Author

@cerebrate I totally agree with you because the mixed hierarchy is created regardless Of genie usage. All error messages appear before distro's banner message and 1st systemd message. Some parts of WSL Kernel code are written 2 years ago, long before of unified hierarchy adoption by Linux distros and OCI runtimes. There were no real consumers for the unified hierarchy. The back-compatibility mode was required by RunC, Docker-for-win-Desktop based on the old DockerCE version. Docker needs cgroup V2 starting from 20-xx version.
Unfortunately, when I looked into WSL-Kernel repository I found that the person who wrote the initial code version has migrated to the systemd.io, the opposite camp.

@PavelSosin-320
Copy link
Author

@WSLUser After all upgrades to the 5.10.16.3-microsoft-standard-WSL2 and genie 1.40 I still stuck with this issue: although parameter systemd.unified_cgroup_hierarchy is passed and accepted by WSL Kernel the kernel insists to create on re-populate cgroup V1 hierarchy and create unified as well.
Log shows:
�[0;1;31mFailed to create symlink /sys/fs/cgroup/net_prio: File exists�[0m
�[0;1;31mFailed to create symlink /sys/fs/cgroup/net_cls: File exists�[0m
�[0;1;31mFailed to create symlink /sys/fs/cgroup/cpu: File exists�[0m
�[0;1;31mFailed to create symlink /sys/fs/cgroup/cpuacct: File exists�[0m

I'm passing now parameter unified_cgroup_hierarchy without systemd. ...
It looks like entire kernel command line option is ignored:
[ 0.000000] Command line: initrd=\initrd.img panic=-1 nr_cpus=2 swiotlb=force console=ttyS0,115200 debug pty.legacy_count=0.

With kernelCommandLine=cgroup_no_v1=all No group hierarchy is created, neither V1 nor V2.
I suppose that at the time when the Kernel was built 2 years ago without FUSE presented in 5.10 MS had problems with mount of group unified FS. It is impossible today too unless mount.fuse is used . But this package is installed only as OCI runtime dependency.

@WSLUser
Copy link

WSLUser commented May 7, 2021

I would repurpose this issue to fix the proprietary init to support v2 as celebrate has pointed out this is the issue with getting cgroupsv2 to be supported. Of course it's also possible that systemd is eventually adopted instead of the init but that's an already known feature request.

@cerebrate
Copy link

@PavelSosin-320

The kernel doesn't do anything with the systemd.* command-line parameters, though, because they aren't kernel parameters. (As you can see, they don't show up in /proc/sys/kernel or the output of sysctl -a.) If you check bootparam(7), what you will see is this:

Any remaining arguments that were not picked up by the kernel and were not interpreted as environment variables are then passed onto PID 1, which is usually the init(1) program. The most common argument that is passed to the init process is the word 'single' which instructs it to boot the computer in single user mode, and not launch all the usual daemons. Check the manual page for the version of init(1) installed on your system to see what arguments it accepts.

Those parameters only do anything because they're passed on to the init(1) launched by the kernel at the top-non-containerized-level, and require that it be systemd to do anything. Which it isn't, so they don't.

(Now, if someone had a lot of time on their hands, they could modify genie so that it pulled the initial kernel command line out of the /proc/cmdline file, parsed out all the systemd.* parameters, and passed them on to the systemd it spawns inside the bottle.

That wouldn't solve this particular issue, since the cgroup hierarchy is already established by the time genie can start its containerized systemd, and the rest of the potential use cases are obscure enough that it's down on my dogwash-priority list. But. hey, if anyone wants to implement it and PR me, they can go right ahead.)

@lightmelodies
Copy link

lightmelodies commented May 8, 2021

After a few tries, I just make cgroup v2 working with following steps:

  1. Add kernelCommandLine=systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all in .wslconfig.
  2. Add cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0 in fstab.
  3. Run sudo mount -a (This step is important before you start systemd.)
  4. Now start systemd (I am using subsystemctl but genie should work similarly. ),

dmesg will show some error since we set cgroup_no_v1=all, just ignore.

[    7.414936] cgroup: Disabled controller 'cpuset'
[    7.414942] init: (1) ERROR: ConfigInitializeCgroups:1685: mount cgroup cpuset failed 22
[    7.414949] cgroup: Disabled controller 'cpu'
[    7.414953] init: (1) ERROR: ConfigInitializeCgroups:1685: mount cgroup cpu failed 22
[    7.414959] cgroup: Disabled controller 'cpuacct' 

ls /sys/fs/cgroup The cgroup v2 controllers should be correctly created by systemd.

cgroup.controllers
cgroup.max.depth
cgroup.max.descendants
cgroup.procs
cgroup.stat
cgroup.subtree_control
cgroup.threads
cpuset.cpus.effective
cpuset.mems.effective
cpu.stat
dev-hugepages.mount
dev-mqueue.mount
init.scope
io.stat
memory.stat
sys-fs-fuse-connections.mount
sys-kernel-debug.mount
sys-kernel-tracing.mount
system.slice
user.slice

Also check with docker info

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 5
 Server Version: 20.10.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8c906ff108ac28da23f69cc7b74f8e7a470d1df0.m
 runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.12.0-sukasuka-kernel+
 Operating System: Arch Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 11.7GiB
 Name: canoe
 ID: ILRY:4MYO:R7F5:2KLA:7TQG:A3PR:D4HL:SY37:5Z7I:JE26:BGPK:HS6E
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

@cerebrate
Copy link

@lightmelodies Interesting.

I have tried that my own self, but all I get is the crash at the end of kernel boot mentioned above (#6662 (comment)).

Can I ask what Windows build you're on, and whether you're using a custom kernel? (And if so, please send .config file?)

@cerebrate
Copy link

Oh, wait.

Add kernelCommandLine=systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all in wsl.config.

You put this in /etc/wsl.config inside the distro, not ~\.wslconfig in Windows? That would be a no-op, since kernelCommandLine only exists as an option in the latter, and would explain why you don't get the crash.

And, curiously enough, I can duplicate your results by just mounting the cgroup2 hierarchy over /sys/fs/cgroup before systemd starts. This doesn't disable cgroups v1 in the kernel (as you can confirm by firing up a second distribution and looking at its /sys/fs/cgroup) or stop its hierarchy from being created/used by earlier processes, but mounting the cgroup2 hierarchy over the hybrid cgroup hierarchy does convince the bottle-container systemd and its children that they should operate in unified mode, not hybrid mode.

I'll leave it up to someone with more cgroups knowledge than me to say whether or not this is actually useful in non-cosmetic ways (or whether it solves @PavelSosin-320 's problem). I'm not adverse to adding a "unified-cgroups" option to genie to enable this automagically, but I'd prefer to know if it's actually useful first.

@cerebrate
Copy link

As a side note, in retrospect, having both wsl.config and .wslconfig existing with disparate functions seems like a bit of a naming oops, what?

@lightmelodies
Copy link

lightmelodies commented May 8, 2021

As a side note, in retrospect, having both wsl.config and .wslconfig existing with disparate functions seems like a bit of a naming oops, what?

Sorry for mistyping, Just set kernelCommandLine in .wslconfig. I am still using windows build 19402 with a custom 5.12 kernel, but the default 5.4.72 kernel also work. Maybe some change in insider add a check in the init process and faill when cgroup v1 is disable.

@cerebrate
Copy link

@lightmelodies Ah, right. Guess so, then, since on the current dev build cgroups_no_v1 reliably breaks in it with both the stock and my custom kernel.

I am curious, though - if you don't set the kernelCommandLine, but you do mount the cgroup2 fs, does it behave any differently?

That seems to get systemd etal. running in unified mode for me even without the kernel part.

@lightmelodies
Copy link

@lightmelodies Ah, right. Guess so, then, since on the current dev build cgroups_no_v1 reliably breaks in it with both the stock and my custom kernel.

I am curious, though - if you don't set the kernelCommandLine, but you do mount the cgroup2 fs, does it behave any differently?

That seems to get systemd etal. running in unified mode for me even without the kernel part.

If I don't set the cgroup_no_v1=all , I can still mount cgroup v2 over /sys/fs/cgroup using fstab, but mount -l will show cgroup v1 mount as well.

cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu type cgroup (rw,nosuid,nodev,noexec,relatime,cpu)
cgroup on /sys/fs/cgroup/cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)  

The docker info also shows Cgroup Version: 2 but with the following warnings, which I think it does not really work.

WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
WARNING: No cpu shares support
WARNING: No cpuset support
WARNING: No io.weight support
WARNING: No io.weight (per device) support
WARNING: No io.max (rbps) support
WARNING: No io.max (wbps) support
WARNING: No io.max (riops) support
WARNING: No io.max (wiops) support    

systemd.unified_cgroup_hierarchy=1 seem to be unnecessary in both case.

@PavelSosin-320
Copy link
Author

I tested systemd boot process via executing systemctl daemon-reload and systemctl daemon-reexecute and found that the result is absolutely stable, exactly as initialization of the distro using genie. Since daemon-reload and execute don't involve systemd-genie I don't see any reason to change something in the genie.
Both the code ofthe package cgroup_v1 in the Microsoft WSL2 kernel and cgroup redesign and re-implementation are results of efforts of the same person /~https://github.com/htejun distanced in few months. The problem is purely political: After releasing of WSL2 ore then 1 year ago MS insists that systemd is redundant but cgroup v2 is useless without systemd. Some WSL distros published and sold via MS store are useless as WSL1 (BASH) distros with systemd. Actually, all Linux systemd-based distros published in MS Store as LTS or dev releases are junk that doesn't worst penny unless systemd is someway added.
Now, MS is trying to avoid using Linux Kernel standard that governed by RedHat exactly as it did it with jscript some years ago. Everybody knows what happened later.
What is Kernel 5.10.16.3-microsoft-standard-WSL2 ? Is it 5.10 or not? Is it 5.10 with unpublished restrictions and known bugs? Where "Microsoft Linux kernel standard" is published?

@cerebrate
Copy link

I checked. There is exactly one difference between cgroup-v1.c in the latest Microsoft WSL kernel and the canonical straight-from-Linus's-own-repo 5.10 release, and that difference is a non-Microsoft patch that was added to said canonical kernel in 5.11-rc3.

I am as eager as the next chap to see all these things made to work, but before we go making allegations, please remember that diff is your friend.

@cerebrate
Copy link

@lightmelodies Makes sense. Seems like there's not a lot of point in adding support for that way of doing things, then.

Thanks for testing it for me.

@nunix
Copy link

nunix commented May 11, 2021

oh a rootless discussion 😄

I managed to have NERDctl fully working with ContainerD in rootless mode (writting the blog now), however it works with cgroup v1 and this "hybrid" mode.

Also, please note that .wslconfig impacts the underlaying VM for WSL2 itself. When you look at the boot process, our distros are run atop of it (with kvm it seems).
And not all Linux distros use SystemD (I'm not defending anything here, just laying facts 😉 ), so what would be neat is to have an extra "Kernel" option inside wsl.conf.

I will try the cgroup_no_v1=all setting, as I had already the systemd.unified_cgroup_hierarchy=1 and as @benhillis said, it's not honored.

PS: I'm switching to kernel 5.13, but for a "nice" rootless experience, you might want to jump to 5.11 at least, as it's where fuse-overlayfs is implemented (and 5.12 has the rootless mount capabilities).

Looking forward to your tests 😄

@nunix
Copy link

nunix commented May 13, 2021

So, after some testing, I could get cgroups V2 working somehow (see screenshot below with podman)

There is still manual steps to perform, but here is in a nutshell what I've done:

  • OS: Win10 Insider Dev v21376
  • WSL2 distro: Fedora 34
  • SystemD: script by Diddledan (/~https://github.com/diddledan/one-script-wsl2-systemd/tree/build-21286%2B)
  • Option in the .wslconfig file: cgroup_no_v1=named (the all value crashes the WSL2 VM)
  • Start the distro from powershell with the root user: wsl -d fedora34 -u root
    • there will be an error on the debug console: Failed to mount cgroup at /sys/fs/cgroup/systemd: No such file or directory
  • Ctrl+C to "start" the shell
  • Unmount all the cgroups: umount /sys/fs/cgroup/*
  • Unmount the root cgroup: umount /sys/fs/cgroup
  • Mount the root cgroup in V2: mount -t cgroup2 cgroup2 /sys/fs/cgroup -o rw,nosuid,nodev,noexec,relatime,nsdelegate
  • In another terminal/tab, start a new session normally

While this should work, the cgroup mount generates errors when we try to write inside it, so for podman I set the pids_limit to 0 (as explained here: https://access.redhat.com/solutions/5913671)

image

Hope this provides some additional hints

@lightmelodies
Copy link

The kernel doc says

All controllers which support v2 and are not bound to a v1 hierarchy are automatically bound to the v2 hierarchy and show up at the root.

So while we can manually umount cgroup v1 then mount cgroup v2 to make systemd work in unified mode, no v2 controllers are available because they are already bound to v1. That's why docker show such warnings. Unfortunately I can not find a way to disable v1 controllers dynamically without cgroup_no_v1=all.

@PavelSosin-320
Copy link
Author

@nunix I use Arkane System systemd-genie that offers almost 100% systemd functionality including systemd-user with only 1 dependency - Dotnet 5.0 and exists for all popular distros. So, on one hand, the home-brewed Didleddan's script hardly satisfy me, and on another hand is able to support group V2 via systemd-root, systemd-user. The problem is only in the Kernel that is called 5.10 but lacks cgroup V2 module. Once, systemd had a feature to convert V1 to V2 but today, as you mentioned, the version's mix is not functional.

@sarim
Copy link

sarim commented Sep 18, 2022

@hypeitnow I don't think you need to edit containers.conf, but the rest is okey. Though you might want to try genie first. If successful, then try distrod and my hack :)

@hypeitnow
Copy link

hypeitnow commented Sep 19, 2022

@hypeitnow I don't think you need to edit containers.conf, but the rest is okey. Though you might want to try genie first. If successful, then try distrod and my hack :)

I did so, but unfortunately:

  1. fstab is completetly ignored e.g in spite of having
    [automount]
    mountFsTab = true
    and
    [boot]
    command = mount --make-shared /
    in my wsl.conf this path is still mounted as private not shared so
  2. When trying to run mount -a podman still sees the cgroup v1 and cgroupfs as system manager but for root it is systemd
  3. Mounting all with mount -a breaks my VSCODE :(

I will try my luck with the kerenel release 5.15 from 20.08 they supposedly implemented miscellaneous cgroup maybe it'll pass

EDIT

I did steps one to 4 after conpiling my kernel in 5.15 and I can call it a success

22:37:13 ❯ findmnt -o PROPAGATION /
PROPAGATION
shared


22:37:16 ❯ podman info
host:
arch: amd64
buildahVersion: 1.23.1
cgroupControllers:

  • pids
    cgroupManager: systemd
    cgroupVersion: v2
    conmon:
    package: 'conmon: /usr/bin/conmon'
    path: /usr/bin/conmon
    version: 'conmon version 2.1.3, commit: unknown'

The only thing left is to eliminate this irritating warning

WARN[0000] "/" is not a shared mount, this could cause issues or missing mounts with rootless containers
Has anyone been successful with enabling this via [command] section or forcing fstab mount in wsl.conf?

@sarim
Copy link

sarim commented Oct 29, 2022

WSL does not use systemd so that setting is not being respected.
- 3rd post - @ benhillis

With systemd now being officially supported, is there any plan to officially support cgroups v2? At least refactoring that piece of init code that's dependent on cgroups v1 memory controller would be very nice :)

@cerebrate
Copy link

Well, looks like there's going to have to be a plan. From the release notes from the just-released systemd 252:

* We intend to remove cgroup v1 support from systemd release after the
 end of 2023. If you run services that make explicit use of cgroup v1
 features (i.e. the "legacy hierarchy" with separate hierarchies for
 each controller), please implement compatibility with cgroup v2 (i.e.
 the "unified hierarchy") sooner rather than later. Most of Linux
 userspace has been ported over already.

@cerebrate
Copy link

Looks like we can use cgroup_no_v1=all in WSL 1.0.1, and doing so and mounting cgroup2 in /etc/fstab with the line

cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0

everything seems to be running happily in cgroups v2 only mode. Bar some irritating errors showing up in the dmsg output, but not that seem to actually affect anything.

@sarim
Copy link

sarim commented Nov 23, 2022

I can also confirm it seems to be working in WSL 1.0.1.0. podman info shows memory controller, also podman stats can show memory values.

(It might not be relevant), but you might have to reboot after wsl --update. I updated wsl with wsl --update, confirmed using wsl --version, but memory controller was still unavailable and mounted in cgroups v1. I tried wsl --shutdown and after 2nd try of wsl --shutdown my PC crashed and rebooted, after that all seems to be working okey.

@ilan-schemoul
Copy link

If I enable 'kernelCommandLine=cgroup_no_v1=all '
wsl cannot starts it reads :
unrecoverable error (it's written in my native tongue I'm not sure that this is the correct translation)
Error code: Wsl/Service/CreateInstance/CreateVm/ConfigureNetworking/0x8000ffff

@sarim
Copy link

sarim commented Nov 30, 2022

@ilan-schemoul you need to have wsl version 1.0.1.0.

It's noted here:
/~https://github.com/microsoft/WSL/releases/tag/1.0.1

Don't fail to start if cgroup mounts fail [GH 8981]

#8981

Funny thing is the referenced issue is only from oct 9 and already had been noticed and "fixed", while this issue is discussing this for many months :P

@ilan-schemoul
Copy link

Thank you updating to 1.0.1.0 solved my issue !

@ilan-schemoul
Copy link

ilan-schemoul commented Nov 30, 2022

I followed this #6662 (comment) and creating a dir in cgroup and going into it and then doing a very simple with root user :
# echo "+cpu +io +memory" > cgroup.subtree_control

prints "bash: echo: write error: No such file or directory" even though the file exists

@josesa-xx
Copy link

I've added the line to /etc/fstab as suggested in #6662 (comment) by @cerebrate and after wsl.exe --shutdown both docker and podman info show using cgroup v2.

Tested both with 1.0.1.0 and 1.0.3.0, with systemd enabled and kernelCommandLine=cgroup_no_v1=all

@f-bn
Copy link

f-bn commented Dec 4, 2022

Can confirm the @cerebrate trick is working nicely here too on WSL 1.0.3.0 (Docker and Minikube are not complaining for now 👍). This is a step forward, having a native cgroups v2 support would be better !

@cerebrate
Copy link

Given the above note regarding the intent of the systemd maintainers, I'm confident it will be coming before too long.

@pierreown
Copy link

pierreown commented Sep 15, 2023

If Systemd is enabled and the startup program changes from /init to /sbin/init.
you can try it:

  1. Add kernelCommandLine Args in .wslconfig:
kernelCommandLine="cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1"
  1. Run script: setup-cgroup2-wsl-init.sh
#!/usr/bin/env bash

rename=/sbin/init-origin

if [ -f $rename ]; then
    exit 0
fi

mv -f /sbin/init $rename

cat <<EOF | tee /sbin/init >/dev/null
#!/bin/sh

if [ \$$ -eq 1 ]; then
    umount -R /sys/fs/cgroup >/dev/null 2>&1
    if ! [ -d /sys/fs/cgroup ]; then
        mkdir -p /sys/fs/cgroup >/dev/null 2>&1
    fi
    mount -t cgroup2 -o rw,nosuid,nodev,noexec,relatime,nsdelegate cgroup2 /sys/fs/cgroup >/dev/null 2>&1
fi
exec "$rename" "\$@"
EOF
chmod +x /sbin/init

cat <<EOF | tee /sbin/init-reset >/dev/null
#!/bin/sh
mv -f $rename /sbin/init && rm -f /sbin/init-reset
EOF
chmod +x /sbin/init-reset
  1. Restart WSL

@sarim
Copy link

sarim commented Sep 15, 2023

@pierre-primary Can you explain a bit about your script? how is it compared to just adding cgroup2 mount in /etc/fstab ?

@pierreown
Copy link

pierreown commented Sep 16, 2023

@sarim

This approach ensures that the adjustment of cgroup2's mount points is completed before /sbin/init starts. However, /etc/fstab is handled by /sbin/init, which is not very reasonable in terms of order. But this only applies to systemd with WSL enabled. When systemd is enabled, the startup program changes to /sbin/init, which can then be replaced by a custom script. By default, /init under wsl will be remounted every time, so /init cannot be replaced by a custom script.

At the same time, this approach would change the process name. If there are any specific requirements, I can provide a more compatible script.

#!/bin/sh

find_shell() {
    while [ $# -gt 0 ]; do
        if shell=$(command -v "$1"); then
            echo "$shell"
            return 0
        fi
        shift
    done
    return 1
}

echo_cgroup2_mount() {
    cat <<"EOF"
if [ $$ -eq 1 ]; then
    umount -R /sys/fs/cgroup >/dev/null 2>&1
    if ! [ -d /sys/fs/cgroup ]; then
        mkdir -p /sys/fs/cgroup >/dev/null 2>&1
    fi
    mount -t cgroup2 -o rw,nosuid,nodev,noexec,relatime,nsdelegate cgroup2 /sys/fs/cgroup >/dev/null 2>&1
fi
EOF
}

target_file=/sbin/init
rename_file=/sbin/init-origin
reset_file=/sbin/init-reset

# Avoid changing it again
if [ -f $rename_file ]; then
    exit 0
fi

# Rename the origin file
mv -f $target_file $rename_file

# If it is a symbolic link, get the real path
real_file=$(readlink $rename_file)
if [ -z "$real_file" ]; then
    real_file=$rename_file
else
    real_file=$(realpath $rename_file)
fi

# Create the replacement script
{
    if shell=$(find_shell ash bash); then
        echo "#!$shell"
        echo_cgroup2_mount
        echo 'exec -a "$0" "'"$real_file"'" "$@"'
    else
        echo "#!/bin/sh"
        echo_cgroup2_mount
        echo 'exec '"$real_file"' "$@"'
    fi
} | tee $target_file >/dev/null
chmod +x $target_file

# Create a rollback script
cat <<EOF | tee $reset_file >/dev/null
#!/bin/sh
mv -f $rename_file $target_file && rm -f $reset_file
EOF
chmod +x $reset_file

@pierreown
Copy link

I suddenly realized that kernelCommandLine = "cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1" can now run properly on WSL2 2.0.0+

@Speeddymon
Copy link

Confirming, WSL2, AlmaLinux9, with a custom kernel, adding kernelCommandLine above without the /sbin/init replacement wrapper script, minikube gets a lot farther for me. Having some trouble with a userns issue, but it's not related to this. Thanks @pierre-primary !

@4-FLOSS-Free-Libre-Open-Source-Software

In the meantime, you could use the CONFIG_CMDLINE fixed wsl2 kernel /~https://github.com/Locietta/xanmod-kernel-WSL2?tab=readme-ov-file#usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests