#

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

Here are 1,275 public repositories matching this topic...

instant-ngp

NVlabs / instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

machine-learning real-time computer-vision neural-network computer-graphics realtime cuda signed-distance-functions nerf 3d-reconstruction function-approximation real-time-rendering

Updated Jan 3, 2025
Cuda

CUDA-Learn-Notes

DefTruth / CUDA-Learn-Notes

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

cuda gemm gemv hgemm

Updated Jan 17, 2025
Cuda

rapidsai / cugraph

cuGraph - RAPIDS Graph Analytics Library

graph graph-algorithms gpu cuda nvidia complex-networks graph-analysis graphml graph-framework rapids

Updated Jan 18, 2025
Cuda

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

cuda llm

Updated Jan 15, 2025
Cuda

CannyLab / tsne-cuda

GPU Accelerated t-SNE for CUDA with Python bindings

python gpu cuda multithreading data-visualization mnist data-analysis tsne-algorithm tsne barnes-hut-tsne barnes-hut fit-tsne tsne-cuda

Updated Oct 2, 2024
Cuda

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

gpu cuda jit pytorch llm-inference flash-attention large-large-models

Updated Jan 18, 2025
Cuda

NVIDIA / CUDALibrarySamples

CUDA Library Samples

gpu linear-algebra cuda cufft cusolver curand cusparse nvjpeg mathdx nppcublas cudss cutenros nvcomp nvjpeg2000 nvtiff

Updated Dec 22, 2024
Cuda

cub

NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See /~https://github.com/NVIDIA/cccl

cxx algorithms cpp gpu cpp14 cuda cpp11 nvidia cpp17 cub cpp20 cxx11 cxx14 cxx17 cxx20 nvidia-hpc-sdk

Updated Oct 9, 2023
Cuda

Celebrandil / CudaSift

A CUDA implementation of SIFT for NVidia GPUs (1.2 ms on a GTX 1060)

gpu cuda nvidia vision sift

Updated Sep 12, 2023
Cuda

thu-ml / SageAttention

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

cuda triton attention quantization video-generation inference-acceleration llm

Updated Dec 28, 2024
Cuda

rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

Updated Jan 17, 2025
Cuda

andyzeng / tsdf-fusion

Fuse multiple depth frames into a TSDF voxel volume.

cuda artificial-intelligence vision rgbd 3d 3d-reconstruction depth-camera volumetric-data 3d-deep-learning tsdf kinect-fusion

Updated May 7, 2019
Cuda

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

cuda inference model-serving llm llm-serving llama2

Updated Sep 21, 2024
Cuda

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

benchmark performance gpu cuda nvidia cuda-kernels kernel-benchmark

Updated Nov 20, 2024
Cuda

brucefan1983 / GPUMD

Graphics Processing Units Molecular Dynamics

machine-learning neural-network simulation gpu cuda molecular-dynamics neuroevolution high-performance-computing molecular-dynamics-simulation phonon physics-simulation natural-evolution-strategies heat-transport gpumd machine-learning-potential

Updated Jan 15, 2025
Cuda

MegviiRobot / MegBA

MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment

high-performance cuda distributed gpu-acceleration graph-optimization bundleadjustment

Updated Jun 3, 2024
Cuda

alicevision / popsift

PopSift is an implementation of the SIFT algorithm in CUDA.

computer-vision gpu cuda image-processing feature-extraction sift

Updated Jan 14, 2025
Cuda

NATTEN

SHI-Labs / NATTEN

Neighborhood Attention Extension. Bringing attention to a neighborhood near you!

cuda pytorch neighborhood-attention

Updated Jan 3, 2025
Cuda

nosferalatu / SimpleGPUHashTable

A simple GPU hash table implemented in CUDA using lock free techniques

gpu cuda data-structures cuda-programming gpu-cuda-programs

Updated Feb 7, 2024
Cuda

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

gpu cuda cublas nvidia gemm matrix-multiply tensor-core hgemm

Updated Sep 8, 2024
Cuda

Created by Nvidia

Released June 23, 2007

Followers: 235 followers
Website: developer.nvidia.com/cuda-zone
Wikipedia: Wikipedia

Related Topics

nvcc