Skip to content

CuFlash-Attn

From-scratch CUDA FlashAttention Reference Implementation

v0.3.0 Stable Baseline

O(N) MemoryTiled algorithm with logarithmic softmax
🎯
Lean MaintenanceDocs and workflows match the real repository surface
🔧
FP32/FP16Forward and backward kernels

Stable v0.3.0 baseline. Lean CUDA FlashAttention reference.