Set NVRTC gpu-architecture flag to maximum supported version #845
+152
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Set NVRTC gpu-architecture flag to maximum supported version for NVRTC & device
Closes #844
The maximum compute capability supported by the currently linked NVRTC that is less than or equal to the device's architecture is used for RTC compilation.
This fixes an issue where running an RTC model on consume ampere (SM_86) would fail on CUDA 11.0 and older, which are not aware of SM_86's existence.
CUDA 11.2+ includes methods to query which architectures are supported by the dynamically linked NVRTC (which may add or remove architectures in new releases, and due to a stable ABI from 11.2 for all 11.x releases the linked version can be different than the version available at compile time).
CUDA 11.1 and below (11.1, 11.0 and 10.x currently in our case) do not include these methods, and due to the absence of a stable nvrtc ABI for these versions the known values can be hardcoded at compile time (grim but simple).
A method to select the most appropriate value form an ascending order vector has also been introduced, so this gencode functionality can be programmatically tested without having to predict what values would be appropriate based on the current device and the cuda version used, which is a moving target.
CUDA 11.2,
SEATBELTS=OFF
tests pass for SM_86 deviceCUDA 10.2,
SEATBELTS=ON
tests pass for SM_86 device (all RTC tests would have failed previously).CUDA 11.0 current
master
errorWith this fix, it succeeds: