-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RTC models do not compile for unknown future CUDA Architectures #844
Comments
The fix for this is to make use of If it is not in the list, passing the latest arch that is supported should work (i.e. the last value returned by |
Edit: Support for these older CUDA versions could be:
|
…the current nvrtc + device Closes #844 The maximum compute capability supported by the currently linked nvrt that is less than or equal to the device's architecture is used for RTC compilation. This fixes an issue where running an RTC model on consume ampere (SM_86) would fail on CUDA 11.0 and older, which are not aware of SM_86's existance. CUDA 11.2+ includes methods to query which architectures are supported by the dynamically linked NVRTC (which may add or remove architectures in new releases, and due to a stable ABI from 11.2 for all 11.x releases the linked version can be different than the version available at compile time). CUDA 11.1 and below (11.1, 11.0 and 10.x currently in our case) do not include these methods, and due to the absence of a stable nvrtc ABI for these versions the known values can be hardcoded at compile time (grim but simple). A method to select the most appropriate value form an ascending order vector has also been introduced, so this gencode functionality can be programatically tested without having to predict what values would be appropraite based on the current device and the cuda version used, which is a moving target.
…c & device Closes #844 The maximum compute capability supported by the currently linked NVRTC that is less than or equal to the device's architecture is used for RTC compilation. This fixes an issue where running an RTC model on consume ampere (SM_86) would fail on CUDA 11.0 and older, which are not aware of SM_86's existence. CUDA 11.2+ includes methods to query which architectures are supported by the dynamically linked NVRTC (which may add or remove architectures in new releases, and due to a stable ABI from 11.2 for all 11.x releases the linked version can be different than the version available at compile time). CUDA 11.1 and below (11.1, 11.0 and 10.x currently in our case) do not include these methods, and due to the absence of a stable nvrtc ABI for these versions the known values can be hardcoded at compile time (grim but simple). A method to select the most appropriate value form an ascending order vector has also been introduced, so this gencode functionality can be programmatically tested without having to predict what values would be appropriate based on the current device and the cuda version used, which is a moving target.
…c & device Closes #844 The maximum compute capability supported by the currently linked NVRTC that is less than or equal to the device's architecture is used for RTC compilation. This fixes an issue where running an RTC model on consume ampere (SM_86) would fail on CUDA 11.0 and older, which are not aware of SM_86's existence. CUDA 11.2+ includes methods to query which architectures are supported by the dynamically linked NVRTC (which may add or remove architectures in new releases, and due to a stable ABI from 11.2 for all 11.x releases the linked version can be different than the version available at compile time). CUDA 11.1 and below (11.1, 11.0 and 10.x currently in our case) do not include these methods, and due to the absence of a stable nvrtc ABI for these versions the known values can be hardcoded at compile time (grim but simple). A method to select the most appropriate value form an ascending order vector has also been introduced, so this gencode functionality can be programmatically tested without having to predict what values would be appropriate based on the current device and the cuda version used, which is a moving target.
…c & device Closes #844 The maximum compute capability supported by the currently linked NVRTC that is less than or equal to the device's architecture is used for RTC compilation. This fixes an issue where running an RTC model on consume ampere (SM_86) would fail on CUDA 11.0 and older, which are not aware of SM_86's existence. CUDA 11.2+ includes methods to query which architectures are supported by the dynamically linked NVRTC (which may add or remove architectures in new releases, and due to a stable ABI from 11.2 for all 11.x releases the linked version can be different than the version available at compile time). CUDA 11.1 and below (11.1, 11.0 and 10.x currently in our case) do not include these methods, and due to the absence of a stable nvrtc ABI for these versions the known values can be hardcoded at compile time (grim but simple). A method to select the most appropriate value form an ascending order vector has also been introduced, so this gencode functionality can be programmatically tested without having to predict what values would be appropriate based on the current device and the cuda version used, which is a moving target.
RTC models are compiled for the device's compute capability, i.e when running on a consumer Ampere GPU, nvrtc is passed
--gpu-architecture=compute_86
.However, if the version of NVRTC does not know about the GPU architecture this will fail to compile, and the user can do nothing about this (other than use a more recent NVRTC)
This means that RTC models will not run on newer GPUs (without using newer features), unlike non-RTC models which will (via PTX embedding / JITing).
To reproduce this, CUDA 11.0 knows SM_80 but not SM_86, so attempting to run a CUDA 11.0 RTC model on consume ampere will fail RTC compialtion, with an error during RTC compilation such as:
The text was updated successfully, but these errors were encountered: