Curated FOSS Directory for Community and Discussion

Back jmaczan/tiny-vllm

tiny-vllm

tiny-vllm

Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM

808 51

last commit 2026-04-14

ai attention batching course cpp cuda hpc inference llm llm-inference

Share:

About

Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM

Languages

C++98.1%Cuda1.8%CMake0.1%Shell0.1%

Contributors1

jmaczan

No features listed.

Comments Theme

Install

Build from source

Keywords

slug: tiny-vllm

Sponsored

Sponsor

Sponsor

Sponsor

Sponsor

300 × 250 (IAB)

300 × 250 (IAB)

Sponsored

Sponsor

Sponsor

Sponsor

Sponsor

300 × 250 (IAB)

300 × 250 (IAB)