You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wish to ask a short question. How to enable TF32 in tch (for CUDA)? This option can make recent NVidia GPUs extremely fast when precision accuracy is not significant. I tried to search TF32 or precision in code of this crate, but could not find this option.
My workaround is using TF32 for GEMM, but not CUDNN.
For other tools, in candle-core of rust, this is done by
In cudarc, it seems that TF32 is not available for cublas wrapper, and TF32 is enforced (hardcoded) in cublaslt wrapper. So suing cudarc::cublaslt::safe will automatically call GEMM with TF32.
Hi tch-rs community!
I wish to ask a short question. How to enable TF32 in tch (for CUDA)? This option can make recent NVidia GPUs extremely fast when precision accuracy is not significant. I tried to search
TF32
orprecision
in code of this crate, but could not find this option.My workaround is using TF32 for GEMM, but not CUDNN.
For other tools, in
candle-core
of rust, this is done byIn
cudarc
, it seems thatTF32
is not available forcublas
wrapper, andTF32
is enforced (hardcoded) incublaslt
wrapper. So suingcudarc::cublaslt::safe
will automatically call GEMM with TF32.In
pytorch
of python, this is done by (https://pytorch.org/docs/stable/notes/cuda.html)The text was updated successfully, but these errors were encountered: