- CUDA Toolkit 12.x
- CMake 3.18+
- C++17 兼容编译器
- NVIDIA GPU (计算能力 7.0+)
📊
渐进式优化
Naive → Tiled → Coalesced → Double Buffer → Register Blocked → Fused → Vectorized
git clone https://github.com/LessUp/mini-inference-engine.git
cd mini-inference-engine
cmake --preset release
cmake --build --preset release
./build-release/benchmark