`shuffle.c` is compiled with `-mavx512*` which results in `SIGILL` on lower CPUs #621

mgorny · 2024-06-21T17:38:07Z

Describe the bug
When compiler supports SSE2/AVX2/AVX512 instructions, the respective -m flags are added to the flags for shuffle.c. As a result, the compiler can compile arbitrary code with these instruction sets. For me, GCC 14 creates a library that uses AVX512 instructions on unsupported CPU and effectively crashes with SIGILL, e.g.:

$ ninja test
[0/1] Running tests...
Test project /tmp/c-blosc2/build
          Start    1: test_plugin_test_ndlz
   1/1746 Test    #1: test_plugin_test_ndlz .....................................***Exception: Illegal  0.11 sec
          Start    2: test_plugin_test_zfp_acc_float
   2/1746 Test    #2: test_plugin_test_zfp_acc_float ............................***Exception: Illegal  0.10 sec
          Start    3: test_plugin_test_zfp_prec_float

For example, plugins/codecs/ndlz/test_ndlz crashes in get_shuffle_implementation:

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│   0x555555582115 <get_shuffle_implementation+37>  jne    0x5555555821d0 <get_shuffle_implementation+224>                            │
│   0x55555558211b <get_shuffle_implementation+43>  vmovq  0x10ae55(%rip),%xmm0        # 0x55555568cf78                               │
│   0x555555582123 <get_shuffle_implementation+51>  vmovq  0x10adad(%rip),%xmm1        # 0x55555568ced8                               │
│   0x55555558212b <get_shuffle_implementation+59>  lea    0xe250a(%rip),%rcx        # 0x55555566463c                                 │
│   0x555555582132 <get_shuffle_implementation+66>  lea    0xa2807(%rip),%rax        # 0x555555624940 <unshuffle_generic>             │
│   0x555555582139 <get_shuffle_implementation+73>  lea    0xa2fc0(%rip),%rdx        # 0x555555625100 <bshuf_untrans_bit_elem_scal>   │
│   0x555555582140 <get_shuffle_implementation+80>  vpinsrq $0x1,%rdx,%xmm1,%xmm1                                                     │
│   0x555555582146 <get_shuffle_implementation+86>  vpinsrq $0x1,%rax,%xmm0,%xmm0                                                     │
│   0x55555558214c <get_shuffle_implementation+92>  mov    %rcx,(%rdi)                                                                │
│   0x55555558214f <get_shuffle_implementation+95>  mov    %rdi,%rax                                                                  │
│  >0x555555582152 <get_shuffle_implementation+98>  vinserti32x4 $0x1,%xmm1,%ymm0,%ymm0                                               │
│   0x555555582159 <get_shuffle_implementation+105> vmovdqu %ymm0,0x8(%rdi)                                                           │
│   0x55555558215e <get_shuffle_implementation+110> vzeroupper                                                                        │
│   0x555555582161 <get_shuffle_implementation+113> ret                                                                               │
│   0x555555582162 <get_shuffle_implementation+114> data16 cs nopw 0x0(%rax,%rax,1)                                                   │
│   0x55555558216d <get_shuffle_implementation+125> nopl   (%rax)                                                                     │
│   0x555555582170 <get_shuffle_implementation+128> cmp    $0x208000,%edx                                                             │
│   0x555555582176 <get_shuffle_implementation+134> je     0x5555555821a0 <get_shuffle_implementation+176>                            │
│   0x555555582178 <get_shuffle_implementation+136> vmovq  0x10ad68(%rip),%xmm0        # 0x55555568cee8                               │
│   0x555555582180 <get_shuffle_implementation+144> vmovq  0x10ae30(%rip),%xmm1        # 0x55555568cfb8                               │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
multi-thre Thread 0x7ffff7caa7 In: get_shuffle_implementation                                                 L306  PC: 0x555555582152

To Reproduce

cmake -G Ninja ${srcdir} -DCMAKE_C_FLAGS='-march=znver2 -O2 -pipe' -DCMAKE_BUILD_TYPE=RelWithDebInfo
ninja
ninja test

Expected behavior
Not crashing, i.e. not using SSE2/AVX2/AVX512 instructions outside specific files intended for them :-).

Logs
CMake output: cmake.txt
Ninja output (for build): build.txt

System information:

OS: Gentoo Linux amd64
Compiler: GCC 14.1.1
Version: 2.15.0

Additional context
n/a

The text was updated successfully, but these errors were encountered:

FrancescAlted · 2024-06-22T10:55:05Z

Yes, shuffle.c is designed to statically include many SIMD instructions and then dynamically select the correct version at runtime. This worked pretty well for many years now.

I don't have an AMD Zen handy, but I have tried this out with an Intel i13900K and an assortment of different compilers (gcc-12, gcc-13, gcc-14, clang-16, clang-17 and clang-18), and only gcc-14 exhibits this issue. Could it be a bug introduced in gcc-14?

FWIW, removing "-march=znver2" in your example works for me.

mgorny · 2024-06-22T11:05:41Z

It's not a bug. If you tell compiler to use AVX512, then it is permitted to use AVX512. That's what the flag does, by design. If anything, it just means that GCC 14 improved the optimizer that it finds new optimization possibilities.

FrancescAlted · 2024-06-22T12:46:16Z

Ok. Then I'm curious on why removing "-march=znver2" in your example works, as blosc is still telling the compiler to use AVX512.

mgorny · 2024-06-22T13:16:19Z

It probably triggers a different optimization profile.

Do not pass `-msse2`, `-mavx2`, etc. flags to the compiler when compiling `shuffle.c`. From what I can see, the file itself does not use any of these intrinsics, and they are only used by functions declared in `bitshuffle-*.c` and `shuffle-*.c` (where the respective flags are still passed). This prevents the compiler from incidentally optimizing the code called independenlty of the runtime CPU check to these instruction sets, effectively causing `SIGILL` on other CPUs. I have verified that this fixes the issue on `-march=znver2`, but also does not cause any issues on `-march=x86-64` and `-march=i686`. Fixes Blosc#621

thesamesam · 2024-06-25T06:14:03Z

Indeed, see https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-mmmx (scroll down a bit);

These options enable GCC to use these extended instructions in generated code, even without -mfpmath=sse. Applications that perform run-time CPU detection must compile separate files for each supported architecture, using the appropriate flags. In particular, the file containing the CPU detection code should be compiled without these options.

Do not pass `-msse2`, `-mavx2`, etc. flags to the compiler when compiling `shuffle.c`. From what I can see, the file itself does not use any of these intrinsics, and they are only used by functions declared in `bitshuffle-*.c` and `shuffle-*.c` (where the respective flags are still passed). This prevents the compiler from incidentally optimizing the code called independenlty of the runtime CPU check to these instruction sets, effectively causing `SIGILL` on other CPUs. I have verified that this fixes the issue on `-march=znver2`, but also does not cause any issues on `-march=x86-64` and `-march=i686`. Fixes Blosc#621

mgorny mentioned this issue Jun 25, 2024

Do not pass -m flags when compiling shuffle.c #622

Merged

FrancescAlted closed this as completed in #622 Jul 4, 2024

FrancescAlted closed this as completed in 2a07f34 Jul 4, 2024

mgorny mentioned this issue Sep 7, 2024

Please relax system c-blosc2 version constraints milesgranger/blosc2-rs#27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`shuffle.c` is compiled with `-mavx512*` which results in `SIGILL` on lower CPUs #621

`shuffle.c` is compiled with `-mavx512*` which results in `SIGILL` on lower CPUs #621

mgorny commented Jun 21, 2024

FrancescAlted commented Jun 22, 2024

mgorny commented Jun 22, 2024

FrancescAlted commented Jun 22, 2024

mgorny commented Jun 22, 2024

thesamesam commented Jun 25, 2024

shuffle.c is compiled with -mavx512* which results in SIGILL on lower CPUs #621

shuffle.c is compiled with -mavx512* which results in SIGILL on lower CPUs #621

Comments

mgorny commented Jun 21, 2024

FrancescAlted commented Jun 22, 2024

mgorny commented Jun 22, 2024

FrancescAlted commented Jun 22, 2024

mgorny commented Jun 22, 2024

thesamesam commented Jun 25, 2024

`shuffle.c` is compiled with `-mavx512*` which results in `SIGILL` on lower CPUs #621

`shuffle.c` is compiled with `-mavx512*` which results in `SIGILL` on lower CPUs #621