Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use kernel newer than kernel-5.18.11-200.fc36 #11

Closed
cfergeau opened this issue Oct 4, 2022 · 17 comments
Closed

Cannot use kernel newer than kernel-5.18.11-200.fc36 #11

cfergeau opened this issue Oct 4, 2022 · 17 comments

Comments

@cfergeau
Copy link
Collaborator

cfergeau commented Oct 4, 2022

https://koji.fedoraproject.org/koji/buildinfo?buildID=2000811 has been used in the latest crc's podman bundle and this kernel was working fine.
However, I've been unable to boot anything newer than this on my Mac M1 using Code-Hex/vz. vfkit would have the same issue.
I tried kernel-5.18.13-200.fc36 , https://koji.fedoraproject.org/koji/buildinfo?buildID=2020964, several newer 5.19.x versions, kernel-6.0.0-54.fc38, ... none of these seemed to be able to boot :(

@cfergeau
Copy link
Collaborator Author

cfergeau commented Oct 4, 2022

Might be similar to Code-Hex/vz#51 (comment)

@cfergeau
Copy link
Collaborator Author

cfergeau commented Oct 4, 2022

Description is inaccurate, as the 4.2.0 podman bundle is actually using /Users/teuf/.crc/cache/crc_podman_vfkit_4.2.0_arm64/vmlinuz-5.18.18-200.fc36.aarch64 and this version works correctly. I need to understand what's going on!

@cfergeau
Copy link
Collaborator Author

cfergeau commented Oct 5, 2022

5.19 kernels don't seem to boot with vfkit/Code-Hex/vz. I tried the 5.19.4 and 5.19.13 fedora kernels. 5.18 kernels have been ok in my testing.

@cfergeau
Copy link
Collaborator Author

cfergeau commented Oct 6, 2022

x86_64 is similarly impacted. I'm trying to get early logs from the virtualization framework, but it's not that easy :-/

@cfergeau
Copy link
Collaborator Author

Did some more testing. I tried with /~https://github.com/evansm7/vftool which also uses Apple's virtualization framework but is implemented in objective C. It fails in the same way. One thing I did not notice earlier is that with these failing kernels, the virtual machine reports an error state. I haven't found details about what caused this error state though.
I tried debian kernels, and the 2 kernels I tried (5.10.0-18 and 6.0.0-1) failed with VZVirtualMachineStateError.

I tried qemu, and I was able to boot all the kernels I tried, for example:

qemu-system-aarch64 -nographic -append "console=ttyAMA0 debug" -cpu max -accel hvf --machine virt -kernel ~/dev/beaker-kernels/vmlinux-6.0.0-54.fc38.aarch6 -initrd ~/dev/beaker-kernels/initramfs-6.0.0-54.fc38.aarch64.img -m 1024

@cfergeau
Copy link
Collaborator Author

On x86_64, 5.19 kernels from f36 are booting fine. 6.0 f36 kernels fail to boot, the VM gets in an error state.

@cfergeau
Copy link
Collaborator Author

@cfergeau
Copy link
Collaborator Author

I tried Ubuntu 6.0 kernels from https://launchpad.net/~tuxinvader on an x86_64 macbook, and they fail with the same problem. This happens both on macOS 11 and 12.
I rebuilt fedora 5.19 and 6.1 kernels on a rhel8 machine, and tested them on the x86_64 macOS11 macbook, 5.19 works, and 6.1 fails in the same way.

Last but not least, I upgraded the m1 machine from 12 to 13, and 5.19+ kernels are now working fine! Latest macOS (12.6.1) still had the issue.

@cfergeau
Copy link
Collaborator Author

cfergeau commented Nov 4, 2022

I managed to find the problematic commit(s) for kernel 5.19 / 6.x. This was tested on my macOS11 / x86_64 macbook.

Early 5.19 kernels fail to boot until https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4b1c742407571eff58b6de9881889f7ca7c4b4dc is applied (5.19.6 and newer should be fine)

6.x kernels need https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6cd514e58f12b211d638dbf6f791fa18d854f09c to be reverted and then they work fine (only tested on x86_64 so far)

@cfergeau
Copy link
Collaborator Author

cfergeau commented Nov 4, 2022

I mentioned the breakage in https://bugzilla.kernel.org/show_bug.cgi?id=215989 , and this was forwarded to the LKML https://lkml.org/lkml/2022/11/4/780

@cfergeau
Copy link
Collaborator Author

cfergeau commented Nov 4, 2022

The M1 boot failure is apparently something else, as a patched kernel does not boot.

@cfergeau
Copy link
Collaborator Author

cfergeau commented Nov 7, 2022

I've bisected the M1 failure to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5e64b862c4823ab53aac028042abd918c2f27041
With just this change, I can boot latest fedora kernel on a macOS12 M1

diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 28d4f442b0bc..a17c876696ee 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -432,7 +432,9 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
        info->reg_id_aa64pfr0 = read_cpuid(ID_AA64PFR0_EL1);
        info->reg_id_aa64pfr1 = read_cpuid(ID_AA64PFR1_EL1);
        info->reg_id_aa64zfr0 = read_cpuid(ID_AA64ZFR0_EL1);
+#if 0
        info->reg_id_aa64smfr0 = read_cpuid(ID_AA64SMFR0_EL1);
+#endif
 
        if (id_aa64pfr1_mte(info->reg_id_aa64pfr1))
                info->reg_gmid = read_cpuid(GMID_EL1);

@cfergeau
Copy link
Collaborator Author

cfergeau commented Nov 7, 2022

I filed an issue with Apple for the M1 problem to ask them if they can fix this for macOS 12.

@cfergeau
Copy link
Collaborator Author

cfergeau commented Nov 8, 2022

Last but not least, I upgraded the m1 machine from 12 to 13, and 5.19+ kernels are now working fine! Latest macOS (12.6.1) still had the issue.

On Intel with macOS 13, I can also successfully boot 6.x kernels from fedora!

@cfergeau
Copy link
Collaborator Author

cfergeau commented Nov 8, 2022

I've bisected the M1 failure to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5e64b862c4823ab53aac028042abd918c2f27041 With just this change, I can boot latest fedora kernel on a macOS12 M1

diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 28d4f442b0bc..a17c876696ee 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -432,7 +432,9 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
        info->reg_id_aa64pfr0 = read_cpuid(ID_AA64PFR0_EL1);
        info->reg_id_aa64pfr1 = read_cpuid(ID_AA64PFR1_EL1);
        info->reg_id_aa64zfr0 = read_cpuid(ID_AA64ZFR0_EL1);
+#if 0
        info->reg_id_aa64smfr0 = read_cpuid(ID_AA64SMFR0_EL1);
+#endif
 
        if (id_aa64pfr1_mte(info->reg_id_aa64pfr1))
                info->reg_gmid = read_cpuid(GMID_EL1);

I've sent http://lists.infradead.org/pipermail/linux-arm-kernel/2022-November/788096.html about this.

@cfergeau
Copy link
Collaborator Author

cfergeau commented Nov 9, 2022

I've sent http://lists.infradead.org/pipermail/linux-arm-kernel/2022-November/788096.html about this.

And it looks like a kernel workaround for the M1 issue is going to be a hard sell http://lists.infradead.org/pipermail/linux-arm-kernel/2022-November/788450.html
Let's wait and hope that what Apple says something about this problem.

@cfergeau cfergeau moved this to In Progress in Project planning: crc Dec 14, 2022
cfergeau added a commit to cfergeau/snc that referenced this issue Dec 20, 2022
Because of crc-org/vfkit#11 we need to use an
older kernel with our macOS bundles.
This is fixed in macOS 13, but we want to keep compatibility with macOS
12 for a while.
With the changes made in the previous commits, this is relatively
straightforward to add.
This uses the latest 5.18 fedora kernel on M1, and latest 5.19 fedora
kernel on Intel macbooks.
cfergeau added a commit to cfergeau/snc that referenced this issue Jan 5, 2023
Because of crc-org/vfkit#11 we need to use an
older kernel with our macOS bundles.
This is fixed in macOS 13, but we want to keep compatibility with macOS
12 for a while.
With the changes made in the previous commits, this is relatively
straightforward to add.
This uses the latest 5.18 fedora kernel on M1, and latest 5.19 fedora
kernel on Intel macbooks.
praveenkumar pushed a commit to crc-org/snc that referenced this issue Jan 6, 2023
Because of crc-org/vfkit#11 we need to use an
older kernel with our macOS bundles.
This is fixed in macOS 13, but we want to keep compatibility with macOS
12 for a while.
With the changes made in the previous commits, this is relatively
straightforward to add.
This uses the latest 5.18 fedora kernel on M1, and latest 5.19 fedora
kernel on Intel macbooks.
praveenkumar added a commit to praveenkumar/snc that referenced this issue May 8, 2023
Because of crc-org/vfkit#11 we need to use an older kernel with our macOS bundles.
This is fixed in macOS 13, but we want to keep compatibility with macOS 12 for a while.
This uses the kernel which is part of RHEL-9.0 by replacing 9.2 kernel.
praveenkumar added a commit to praveenkumar/snc that referenced this issue May 22, 2023
Because of crc-org/vfkit#11 we need to use an older kernel with our macOS bundles.
This is fixed in macOS 13, but we want to keep compatibility with macOS 12 for a while.
This uses the kernel which is part of RHEL-9.0 by replacing 9.2 kernel.

Ideally RHEL-9.2 kernel which is still 5.14.x should be working but on
RHEL side there is always backports for some feature/bug_fixes/CVE.

Non working kernel (on Intel/M1)
- kernel-5.14.0-284.11.1.el9_2.x86_64.rpm

Working kernel (On Intel/M1) which we are downgrading to.
- kernel-5.14.0-70.53.1.el9_0.x86_64.rpm
praveenkumar added a commit to crc-org/snc that referenced this issue May 23, 2023
Because of crc-org/vfkit#11 we need to use an older kernel with our macOS bundles.
This is fixed in macOS 13, but we want to keep compatibility with macOS 12 for a while.
This uses the kernel which is part of RHEL-9.0 by replacing 9.2 kernel.

Ideally RHEL-9.2 kernel which is still 5.14.x should be working but on
RHEL side there is always backports for some feature/bug_fixes/CVE.

Non working kernel (on Intel/M1)
- kernel-5.14.0-284.11.1.el9_2.x86_64.rpm

Working kernel (On Intel/M1) which we are downgrading to.
- kernel-5.14.0-70.53.1.el9_0.x86_64.rpm
praveenkumar added a commit to praveenkumar/snc that referenced this issue May 25, 2023
We added regression 68c6383 for OKD
bundle since we never downgrade kernel for OKD even in case of 4.12 side
and OKD always used fedora as base.

During internal discussion we are not sure how many folks are consuming
OKD on mac since there is no arm64 support and no bug/issue reported for
OKD on mac due to crc-org/vfkit#11.

With this PR we are still not downgrading kernel and if there is some
user issue/feedback we will ask community to put a patch for it given
that we already have logic in `podman` branch.
praveenkumar added a commit to crc-org/snc that referenced this issue May 26, 2023
We added regression 68c6383 for OKD
bundle since we never downgrade kernel for OKD even in case of 4.12 side
and OKD always used fedora as base.

During internal discussion we are not sure how many folks are consuming
OKD on mac since there is no arm64 support and no bug/issue reported for
OKD on mac due to crc-org/vfkit#11.

With this PR we are still not downgrading kernel and if there is some
user issue/feedback we will ask community to put a patch for it given
that we already have logic in `podman` branch.
@cfergeau
Copy link
Collaborator Author

Closing this issue as by now it's not unreasonable to ask people having this issue to upgrade to macOS13 or newer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant