A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Jan 18, 2025 - Python
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
A high-throughput and memory-efficient inference and serving engine for LLMs
Containers for machine learning
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
SGLang is a fast serving framework for large language models and vision language models.
A flexible framework of neural networks for deep learning
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
A PyTorch Library for Accelerating 3D Deep Learning Research
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Pytorch domain library for recommendation systems
CUDA integration for Python, plus shiny features
Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.
PyTorch native quantization and sparsity for training and inference
🤖 A Python library for learning and evaluating knowledge graph embeddings
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
Created by Nvidia
Released June 23, 2007