Home
# Hetero-Paged-Infer **High-Performance LLM Inference Engine** *PagedAttention + Continuous Batching* [Get Started](setup/quickstart.md){ .md-button .md-button--primary } [GitHub](https://github.com/LessUp/hetero-paged-infer){ .md-button }
Features
| Feature | Description |
| PagedAttention | Block-based KV Cache, <5% memory waste |
| Continuous Batching | Dynamic prefill/decode scheduling |
| Production Ready | Error handling, metrics, monitoring |
| Well Tested | 135 tests (unit, property, integration) |
Quick Start
git clone https://github.com/LessUp/hetero-paged-infer.git
cd hetero-paged-infer
cargo build --release
./target/release/hetero-infer --input "Hello, world!" --max-tokens 50
| Method | Memory Waste | Throughput |
| Static Allocation | ~40-60% | Baseline |
| PagedAttention | <5% | +50% |
Architecture
┌─────────────────────────────────────────────────┐
│ InferenceEngine (CPU) │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │Tokenizer │ │Scheduler │ │ KV Cache Mgr │ │
│ └────┬─────┘ └────┬─────┘ └──────┬───────┘ │
│ └─────────────┼───────────────┘ │
├─────────────────────┼───────────────────────────┤
│ ┌─────▼─────┐ │
│ │ GPU │ Executor + Memory │
│ └───────────┘ │
└─────────────────────────────────────────────────┘
Documentation