Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shuffle.c is compiled with -mavx512* which results in SIGILL on lower CPUs #621

Closed
mgorny opened this issue Jun 21, 2024 · 5 comments · Fixed by #622
Closed

shuffle.c is compiled with -mavx512* which results in SIGILL on lower CPUs #621

mgorny opened this issue Jun 21, 2024 · 5 comments · Fixed by #622

Comments

@mgorny
Copy link
Contributor

mgorny commented Jun 21, 2024

Describe the bug
When compiler supports SSE2/AVX2/AVX512 instructions, the respective -m flags are added to the flags for shuffle.c. As a result, the compiler can compile arbitrary code with these instruction sets. For me, GCC 14 creates a library that uses AVX512 instructions on unsupported CPU and effectively crashes with SIGILL, e.g.:

$ ninja test
[0/1] Running tests...
Test project /tmp/c-blosc2/build
          Start    1: test_plugin_test_ndlz
   1/1746 Test    #1: test_plugin_test_ndlz .....................................***Exception: Illegal  0.11 sec
          Start    2: test_plugin_test_zfp_acc_float
   2/1746 Test    #2: test_plugin_test_zfp_acc_float ............................***Exception: Illegal  0.10 sec
          Start    3: test_plugin_test_zfp_prec_float

For example, plugins/codecs/ndlz/test_ndlz crashes in get_shuffle_implementation:

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│   0x555555582115 <get_shuffle_implementation+37>  jne    0x5555555821d0 <get_shuffle_implementation+224>                            │
│   0x55555558211b <get_shuffle_implementation+43>  vmovq  0x10ae55(%rip),%xmm0        # 0x55555568cf78                               │
│   0x555555582123 <get_shuffle_implementation+51>  vmovq  0x10adad(%rip),%xmm1        # 0x55555568ced8                               │
│   0x55555558212b <get_shuffle_implementation+59>  lea    0xe250a(%rip),%rcx        # 0x55555566463c                                 │
│   0x555555582132 <get_shuffle_implementation+66>  lea    0xa2807(%rip),%rax        # 0x555555624940 <unshuffle_generic>             │
│   0x555555582139 <get_shuffle_implementation+73>  lea    0xa2fc0(%rip),%rdx        # 0x555555625100 <bshuf_untrans_bit_elem_scal>   │
│   0x555555582140 <get_shuffle_implementation+80>  vpinsrq $0x1,%rdx,%xmm1,%xmm1                                                     │
│   0x555555582146 <get_shuffle_implementation+86>  vpinsrq $0x1,%rax,%xmm0,%xmm0                                                     │
│   0x55555558214c <get_shuffle_implementation+92>  mov    %rcx,(%rdi)                                                                │
│   0x55555558214f <get_shuffle_implementation+95>  mov    %rdi,%rax                                                                  │
│  >0x555555582152 <get_shuffle_implementation+98>  vinserti32x4 $0x1,%xmm1,%ymm0,%ymm0                                               │
│   0x555555582159 <get_shuffle_implementation+105> vmovdqu %ymm0,0x8(%rdi)                                                           │
│   0x55555558215e <get_shuffle_implementation+110> vzeroupper                                                                        │
│   0x555555582161 <get_shuffle_implementation+113> ret                                                                               │
│   0x555555582162 <get_shuffle_implementation+114> data16 cs nopw 0x0(%rax,%rax,1)                                                   │
│   0x55555558216d <get_shuffle_implementation+125> nopl   (%rax)                                                                     │
│   0x555555582170 <get_shuffle_implementation+128> cmp    $0x208000,%edx                                                             │
│   0x555555582176 <get_shuffle_implementation+134> je     0x5555555821a0 <get_shuffle_implementation+176>                            │
│   0x555555582178 <get_shuffle_implementation+136> vmovq  0x10ad68(%rip),%xmm0        # 0x55555568cee8                               │
│   0x555555582180 <get_shuffle_implementation+144> vmovq  0x10ae30(%rip),%xmm1        # 0x55555568cfb8                               │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
multi-thre Thread 0x7ffff7caa7 In: get_shuffle_implementation                                                 L306  PC: 0x555555582152 

To Reproduce

cmake -G Ninja ${srcdir} -DCMAKE_C_FLAGS='-march=znver2 -O2 -pipe' -DCMAKE_BUILD_TYPE=RelWithDebInfo
ninja
ninja test

Expected behavior
Not crashing, i.e. not using SSE2/AVX2/AVX512 instructions outside specific files intended for them :-).

Logs
CMake output: cmake.txt
Ninja output (for build): build.txt

System information:

  • OS: Gentoo Linux amd64
  • Compiler: GCC 14.1.1
  • Version: 2.15.0

Additional context
n/a

@FrancescAlted
Copy link
Member

Yes, shuffle.c is designed to statically include many SIMD instructions and then dynamically select the correct version at runtime. This worked pretty well for many years now.

I don't have an AMD Zen handy, but I have tried this out with an Intel i13900K and an assortment of different compilers (gcc-12, gcc-13, gcc-14, clang-16, clang-17 and clang-18), and only gcc-14 exhibits this issue. Could it be a bug introduced in gcc-14?

FWIW, removing "-march=znver2" in your example works for me.

@mgorny
Copy link
Contributor Author

mgorny commented Jun 22, 2024

It's not a bug. If you tell compiler to use AVX512, then it is permitted to use AVX512. That's what the flag does, by design. If anything, it just means that GCC 14 improved the optimizer that it finds new optimization possibilities.

@FrancescAlted
Copy link
Member

Ok. Then I'm curious on why removing "-march=znver2" in your example works, as blosc is still telling the compiler to use AVX512.

@mgorny
Copy link
Contributor Author

mgorny commented Jun 22, 2024

It probably triggers a different optimization profile.

mgorny added a commit to mgorny/c-blosc2 that referenced this issue Jun 25, 2024
Do not pass `-msse2`, `-mavx2`, etc. flags to the compiler when
compiling `shuffle.c`.  From what I can see, the file itself does not
use any of these intrinsics, and they are only used by functions
declared in `bitshuffle-*.c` and `shuffle-*.c` (where the respective
flags are still passed).  This prevents the compiler from incidentally
optimizing the code called independenlty of the runtime CPU check to
these instruction sets, effectively causing `SIGILL` on other CPUs.

I have verified that this fixes the issue on `-march=znver2`, but also
does not cause any issues on `-march=x86-64` and `-march=i686`.

Fixes Blosc#621
@thesamesam
Copy link

Indeed, see https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-mmmx (scroll down a bit);

These options enable GCC to use these extended instructions in generated code, even without -mfpmath=sse. Applications that perform run-time CPU detection must compile separate files for each supported architecture, using the appropriate flags. In particular, the file containing the CPU detection code should be compiled without these options.

mgorny added a commit to mgorny/c-blosc2 that referenced this issue Jun 25, 2024
Do not pass `-msse2`, `-mavx2`, etc. flags to the compiler when
compiling `shuffle.c`.  From what I can see, the file itself does not
use any of these intrinsics, and they are only used by functions
declared in `bitshuffle-*.c` and `shuffle-*.c` (where the respective
flags are still passed).  This prevents the compiler from incidentally
optimizing the code called independenlty of the runtime CPU check to
these instruction sets, effectively causing `SIGILL` on other CPUs.

I have verified that this fixes the issue on `-march=znver2`, but also
does not cause any issues on `-march=x86-64` and `-march=i686`.

Fixes Blosc#621
t20100 pushed a commit to t20100/c-blosc2 that referenced this issue Jul 3, 2024
Do not pass `-msse2`, `-mavx2`, etc. flags to the compiler when
compiling `shuffle.c`.  From what I can see, the file itself does not
use any of these intrinsics, and they are only used by functions
declared in `bitshuffle-*.c` and `shuffle-*.c` (where the respective
flags are still passed).  This prevents the compiler from incidentally
optimizing the code called independenlty of the runtime CPU check to
these instruction sets, effectively causing `SIGILL` on other CPUs.

I have verified that this fixes the issue on `-march=znver2`, but also
does not cause any issues on `-march=x86-64` and `-march=i686`.

Fixes Blosc#621
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants