Set NVRTC gpu-architecture flag to maximum supported version #845

ptheywood · 2022-05-06T17:01:34Z

Set NVRTC gpu-architecture flag to maximum supported version for NVRTC & device

Closes #844

The maximum compute capability supported by the currently linked NVRTC that is less than or equal to the device's architecture is used for RTC compilation.

This fixes an issue where running an RTC model on consume ampere (SM_86) would fail on CUDA 11.0 and older, which are not aware of SM_86's existence.

CUDA 11.2+ includes methods to query which architectures are supported by the dynamically linked NVRTC (which may add or remove architectures in new releases, and due to a stable ABI from 11.2 for all 11.x releases the linked version can be different than the version available at compile time).
CUDA 11.1 and below (11.1, 11.0 and 10.x currently in our case) do not include these methods, and due to the absence of a stable nvrtc ABI for these versions the known values can be hardcoded at compile time (grim but simple).

A method to select the most appropriate value form an ascending order vector has also been introduced, so this gencode functionality can be programmatically tested without having to predict what values would be appropriate based on the current device and the cuda version used, which is a moving target.

CUDA 11.2, SEATBELTS=OFF tests pass for SM_86 device

[----------] Global test environment tear-down
[==========] 877 tests from 77 test suites ran. (48705 ms total)
[  PASSED  ] 877 tests.

  YOU HAVE 160 DISABLED TESTS

CUDA 10.2, SEATBELTS=ON tests pass for SM_86 device (all RTC tests would have failed previously).

[==========] 997 tests from 79 test suites ran. (61473 ms total)
[  PASSED  ] 997 tests.

  YOU HAVE 40 DISABLED TESTS

CUDA 11.0 current master error

./bin/Release/boids_rtc_spatial3D -s 1 
Compiler options: --gpu-architecture=compute_86 --generate-line-info -DNDEBUG --std=c++17 --define-macro=SEATBELTS=1 --pre-include=/usr/local/cuda-11.6/include//cuda.h 
terminate called after throwing an instance of 'flamegpu::exception::InvalidAgentFunc'
  what():  FLAMEGPU2/src/flamegpu/util/detail/JitifyCache.cu(380): Error compiling runtime agent function (or function condition) ('outputdata'): function had compilation errors (see std::cout), in JitifyCache::buildProgram().
Aborted (core dumped)

With this fix, it succeeds:

./bin/Release/boids_rtc_spatial3D -s 1 -v
FLAME GPU 2.0.0-alpha.3+3b0c3e96
Processing Simulation Step 0

src/flamegpu/util/detail/JitifyCache.cu

src/flamegpu/util/detail/compute_capability.cu

Robadob

Looks like a solid PR. Haven't ran it locally, but assume your tests are sufficient.

…c & device Closes #844 The maximum compute capability supported by the currently linked NVRTC that is less than or equal to the device's architecture is used for RTC compilation. This fixes an issue where running an RTC model on consume ampere (SM_86) would fail on CUDA 11.0 and older, which are not aware of SM_86's existence. CUDA 11.2+ includes methods to query which architectures are supported by the dynamically linked NVRTC (which may add or remove architectures in new releases, and due to a stable ABI from 11.2 for all 11.x releases the linked version can be different than the version available at compile time). CUDA 11.1 and below (11.1, 11.0 and 10.x currently in our case) do not include these methods, and due to the absence of a stable nvrtc ABI for these versions the known values can be hardcoded at compile time (grim but simple). A method to select the most appropriate value form an ascending order vector has also been introduced, so this gencode functionality can be programmatically tested without having to predict what values would be appropriate based on the current device and the cuda version used, which is a moving target.

ptheywood added the RTC label May 6, 2022

ptheywood added this to the v2.0.0-alpha.3 milestone May 6, 2022

ptheywood requested review from Robadob and mondus May 6, 2022 17:01

Robadob reviewed May 6, 2022

View reviewed changes

src/flamegpu/util/detail/JitifyCache.cu Outdated Show resolved Hide resolved

Robadob reviewed May 6, 2022

View reviewed changes

src/flamegpu/util/detail/compute_capability.cu Show resolved Hide resolved

Robadob approved these changes May 6, 2022

View reviewed changes

ptheywood force-pushed the nvrtc-arch branch from 3b0c3e9 to 09b0f01 Compare May 9, 2022 09:42

mondus merged commit 7652ec1 into master May 11, 2022

mondus deleted the nvrtc-arch branch May 11, 2022 14:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set NVRTC gpu-architecture flag to maximum supported version #845

Set NVRTC gpu-architecture flag to maximum supported version #845

ptheywood commented May 6, 2022 •

edited

Loading

Robadob left a comment

Set NVRTC gpu-architecture flag to maximum supported version #845

Set NVRTC gpu-architecture flag to maximum supported version #845

Conversation

ptheywood commented May 6, 2022 • edited Loading

Robadob left a comment

Choose a reason for hiding this comment

ptheywood commented May 6, 2022 •

edited

Loading