Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set NVRTC gpu-architecture flag to maximum supported version #845

Merged
merged 1 commit into from
May 11, 2022

Conversation

ptheywood
Copy link
Member

@ptheywood ptheywood commented May 6, 2022

Set NVRTC gpu-architecture flag to maximum supported version for NVRTC & device

Closes #844

The maximum compute capability supported by the currently linked NVRTC that is less than or equal to the device's architecture is used for RTC compilation.

This fixes an issue where running an RTC model on consume ampere (SM_86) would fail on CUDA 11.0 and older, which are not aware of SM_86's existence.

CUDA 11.2+ includes methods to query which architectures are supported by the dynamically linked NVRTC (which may add or remove architectures in new releases, and due to a stable ABI from 11.2 for all 11.x releases the linked version can be different than the version available at compile time).
CUDA 11.1 and below (11.1, 11.0 and 10.x currently in our case) do not include these methods, and due to the absence of a stable nvrtc ABI for these versions the known values can be hardcoded at compile time (grim but simple).

A method to select the most appropriate value form an ascending order vector has also been introduced, so this gencode functionality can be programmatically tested without having to predict what values would be appropriate based on the current device and the cuda version used, which is a moving target.


CUDA 11.2, SEATBELTS=OFF tests pass for SM_86 device

[----------] Global test environment tear-down
[==========] 877 tests from 77 test suites ran. (48705 ms total)
[  PASSED  ] 877 tests.

  YOU HAVE 160 DISABLED TESTS

CUDA 10.2, SEATBELTS=ON tests pass for SM_86 device (all RTC tests would have failed previously).

[==========] 997 tests from 79 test suites ran. (61473 ms total)
[  PASSED  ] 997 tests.

  YOU HAVE 40 DISABLED TESTS

CUDA 11.0 current master error

./bin/Release/boids_rtc_spatial3D -s 1 
Compiler options: --gpu-architecture=compute_86 --generate-line-info -DNDEBUG --std=c++17 --define-macro=SEATBELTS=1 --pre-include=/usr/local/cuda-11.6/include//cuda.h 
terminate called after throwing an instance of 'flamegpu::exception::InvalidAgentFunc'
  what():  FLAMEGPU2/src/flamegpu/util/detail/JitifyCache.cu(380): Error compiling runtime agent function (or function condition) ('outputdata'): function had compilation errors (see std::cout), in JitifyCache::buildProgram().
Aborted (core dumped)

With this fix, it succeeds:

./bin/Release/boids_rtc_spatial3D -s 1 -v
FLAME GPU 2.0.0-alpha.3+3b0c3e96
Processing Simulation Step 0

@ptheywood ptheywood added the RTC label May 6, 2022
@ptheywood ptheywood added this to the v2.0.0-alpha.3 milestone May 6, 2022
@ptheywood ptheywood requested review from Robadob and mondus May 6, 2022 17:01
Copy link
Member

@Robadob Robadob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a solid PR. Haven't ran it locally, but assume your tests are sufficient.

…c & device

Closes #844

The maximum compute capability supported by the currently linked NVRTC that is less than or equal to the device's architecture is used for RTC compilation.

This fixes an issue where running an RTC model on consume ampere (SM_86) would fail on CUDA 11.0 and older, which are not aware of SM_86's existence.

CUDA 11.2+ includes methods to query which architectures are supported by the dynamically linked NVRTC (which may add or remove architectures in new releases, and due to a stable ABI from 11.2 for all 11.x releases the linked version can be different than the version available at compile time).
CUDA 11.1 and below (11.1, 11.0 and 10.x currently in our case) do not include these methods, and due to the absence of a stable nvrtc ABI for these versions the known values can be hardcoded at compile time (grim but simple).

A method to select the most appropriate value form an ascending order vector has also been introduced, so this gencode functionality can be programmatically tested without having to predict what values would be appropriate based on the current device and the cuda version used, which is a moving target.
@mondus mondus merged commit 7652ec1 into master May 11, 2022
@mondus mondus deleted the nvrtc-arch branch May 11, 2022 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RTC models do not compile for unknown future CUDA Architectures
3 participants