Skip to content

Tiny-LLMCUDA C++ Inference, Kept Small

Focused Transformer inference engine with W8A16 kernels, explicit KV cache management, and a deliberately small repository surface.

Tiny-LLM Logo

Released under the MIT License.