-
Notifications
You must be signed in to change notification settings - Fork 627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect missing vst1q_f32_x2 and provide replacement if necessary #1358
Detect missing vst1q_f32_x2 and provide replacement if necessary #1358
Conversation
|
57a734e
to
1b9d235
Compare
Thanks for the fix, but see the suggestions above. |
Which suggestions? I don't see any; apologies if I'm looking in the wrong place. |
cmake/OpenEXRSetup.cmake
Outdated
}" HAS_VLD1) | ||
|
||
if(NOT HAS_VLD1) | ||
string(APPEND CMAKE_CXX_FLAGS " -DMISSING_ARM_VLD1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be better as a CMake-generated define rather than a compiler flag:
string(APPEND CMAKE_CXX_FLAGS " -DMISSING_ARM_VLD1") | |
set(OPENEXR_MISSING_ARM_VLD1" 1) |
Then add:
#cmakedefine OPENEXR_MISSING_ARM_VLD1 0
to cmake/OpenEXRConfigInternal.h.in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These notes are now implemented; the flag is now set as a CMake-generated define and is in cmake/OpenEXRConfigInternal.h.in
.
src/lib/OpenEXR/ImfSimd.h
Outdated
@@ -62,4 +62,17 @@ extern "C" { | |||
|
|||
} | |||
|
|||
#if defined (MISSING_ARM_VLD1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and change this to OPENEXR_MISSING_ARM_VLD1
to identify it as an OpenEXR config thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is now implemented; the flag is now named OPENEXR_MISSING_ARM_VLD1
.
Oops, I neglected to submit the review, sorry. Should be there now. |
42094c7
to
49b2fa8
Compare
Whoops, sorry for the bevy of build test failures. I didn't realize that pushing a WIP to the branch on my fork would automatically update the merge request and kick off tests here! |
49b2fa8
to
540cc27
Compare
Older versions of GCC (< 9) do not provide the vst1q_f32_x2 intrinsic on aarch64, so we must detect when vst1q_f32_x2 is not available and provide our own implementation instead. Signed-off-by: Yining Karl Li <betajippity@gmail.com>
540cc27
to
dc0e241
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
…demySoftwareFoundation#1358) Older versions of GCC (< 9) do not provide the vst1q_f32_x2 intrinsic on aarch64, so we must detect when vst1q_f32_x2 is not available and provide our own implementation instead. Signed-off-by: Yining Karl Li <betajippity@gmail.com>
…demySoftwareFoundation#1358) Older versions of GCC (< 9) do not provide the vst1q_f32_x2 intrinsic on aarch64, so we must detect when vst1q_f32_x2 is not available and provide our own implementation instead. Signed-off-by: Yining Karl Li <betajippity@gmail.com>
Older versions of GCC (< 9) do not provide the vst1q_f32_x2 intrinsic on aarch64, so we must detect when vst1q_f32_x2 is not available and provide our own implementation instead. Signed-off-by: Yining Karl Li <betajippity@gmail.com>
In ImfDwaCompressorSimd.h, convertFloatToHalf64_neon() uses the vst1q_f32_x2 intrinsic on aarch64. However, older versions of GCC (< 9) do not provide the vst1q_f32_x2 intrinsic on aarch64, so we must detect when vst1q_f32_x2 is not available and provide our own implementation instead.
Specific examples of platforms where this fix is needed are CentOS 8 on aarch64, and NVIDIA's Linux For Tegra 32.x on Jetson Nano.