-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New bitshuffle functions #567
Conversation
Although it is slower that our existing code, and of course, slower than the generic code (at least on an ARM M1), at least this is completely correct, even for typesizes 1, 2 and 4 (the previous code was not). Also, syncing sources with the bitshuffle project makes things easier when porting docs and fixes from there.
In preparation for the merging of Blosc#567.
In preparation for the merging of Blosc/c-blosc2#567.
In preparation for the merging of #567.
This should be mostly ready, but for some reason, fuzzing is failing in CI. I have tried to reproduce that in my Ubuntu box, but I cannot. Also, valgrind seems fine for the dataset causing the issue to the |
Log shows the input that causes the crash is base64 |
Yes, I did exacly that, but I am unable to reproduce the issue:
Am I missing something? |
I haven't tried it but I can later. Interesting it fails in |
It doesn't look like you are doing anything controversial in that function. |
This is what I think. Do you know if there is a way to silence the error and treat it as a false positive? |
Here are some of my findings:
|
Right. Done in 5cd384c. And hey! this seems to fix the OSS-Fuzz issue, although frankly, I don't understand why. Incidentally, I also activated a solution based on __builtin_cpu_supports that works well on modern GCC and Clang compilers. This looks cleaner than the other based in
Yup, I tried to update to zlib-ng 2.1.x a while ago, but still doesn't work. For more context, see also zlib-ng/zlib-ng#1560. I suppose the solution is close, but I have not figure it out yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Congrats on making this work!
I am going to merge this. BTW, the ALTIVEC implementation for bitshuffle needed a small API change. @kif or @t20100 please check this out in a POWER machine when you have the opportunity. Thanks! |
In preparation for the merging of Blosc/c-blosc2#567.
For completeness, here are the results for Zstd. As this is quite more CPU intensive than LZ4, the speedup due to the bitshuffle acceleration is not as evident, but one can still see up to 15% speed-ups: As for decompression, there is almost no difference (and sometimes AVX2 seems to do a better job): |
Hi @FrancescAlted , EDIT: running b2bench uses AVX512 in anyway?
thanks.. |
Yes,
|
Hi, I updated |
thanks for details.. my setup is nearly identical in perf.. |
Changes from 2.11.1 to 2.11.2 ============================= * Added support for ARMv7l platforms (Raspberry Pi). The NEON version of the bitshuffle filter was not compiling there, and besides it offered no performance advantage over the generic bitshuffle version (it is 2x to 3x slower actually). So bitshuffle-neon.c has been disabled by default in all ARM platforms. * Also, unaligned access has been disabled in all ARM non-64bits platforms. It turned out that, at least the armv7l CPU in Raspberry Pi 4, had issues because `__ARM_FEATURE_UNALIGNED` C macro was asserted in the compiler (both gcc and clang), but it actually made binaries to raise a "Bus error". * Thanks to Ben Nuttall for providing a Raspberry Pi for tracking down these issues. Changes from 2.11.0 to 2.11.1 ============================= * Fix ALTIVEC header. Only affects to IBM POWER builds. Thanks to Michael Kuhn for providing a patch. Changes from 2.10.5 to 2.11.0 ============================= * New AVX512 support for the bitshuffle filter. This is a backport of the upstream bitshuffle project (/~https://github.com/kiyo-masui/bitshuffle). Expect up to [20% better compression speed](Blosc/c-blosc2#567 (comment)) on AMD Zen4 architecture (7950X3D CPU). * Add c-blosc2 package definition for Guix. Thanks to Ivan Vilata. * Properly check calls to `strtol`. * Export the `b2nd_copy_buffer` function. This may be useful for other projects dealing with multidimensional arrays in memory. Thanks to Ivan Vilata. * Better check that nthreads must be >= 1 and <= INT16_MAX. * Fix compile arguments for armv7l. Thanks to Ben Greiner.
In preparation for the merging of Blosc/c-blosc2#567.
Here it is a new implementation of bitshuffle for AVX512 coming from upstream. I also took the opportunity to sync the SSE2, AVX2 and ARM/NEON implementions (however, the latter is still pretty much slower than the generic solution, so it is not enabled by default).
Preliminary results using AVX512 on Zen4 (AMD 7950X) points to an speed-up of 10% to 15%, which is quite good (most specially as Zen4 does not have a full 512-bit register implementation).
Finally, AVX512 codepath will be compiled by default on UNIX platforms (still need to figure out which MSVC version supports AVX512), and will be used when AVX512 (more specifically AVX512F and AVX512BW) presence should be detected in runtime.