A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Jan 18, 2025 - Python
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
A high-throughput and memory-efficient inference and serving engine for LLMs
Build and run Docker containers leveraging NVIDIA GPUs
Instant neural graphics primitives: lightning fast NeRF and more
kaldi-asr/kaldi is the official location of the Kaldi project.
Open3D: A Modern Library for 3D Data Processing
Containers for machine learning
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Modular ZK(Zero Knowledge) backend accelerated by GPU
SGLang is a fast serving framework for large language models and vision language models.
Go package for computer vision using OpenCV 4 and beyond. Includes support for DNN, CUDA, OpenCV Contrib, and OpenVINO.
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
CUDA Templates for Linear Algebra Subroutines
A flexible framework of neural networks for deep learning
Created by Nvidia
Released June 23, 2007