llm-serving

Here are 4 public repositories matching this topic...

A highly optimized LLM inference acceleration engine for Llama and its variants.

cuda pytorch llama gpt inference-engine model-serving llm llm-serving llm-inference deepseek-r1

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

inference llama gpt model-serving llm llmops llm-serving

Serving Inside Pytorch

deployment inference pytorch ray serve tensorrt serving pipeline-parallelism torch2trt triton-inference-server llm-serving

High-speed and easy-use LLM serving framework for local deployment

smartphone llama npu llm llm-serving llm-inference qwen smallthinker

Add a description, image, and links to the llm-serving topic page so that developers can more easily learn about it.

To associate your repository with the llm-serving topic, visit your repo's landing page and select "manage topics."