CuFlash-Attn | CuFlash-Attn

From-scratch CUDA FlashAttention Reference Implementation

v0.3.0 Stable Baseline

Select your preferred language

⚡

O(N) MemoryTiled algorithm with logarithmic softmax

🎯

Lean MaintenanceDocs and workflows match the real repository surface

🔧

FP32/FP16Forward and backward kernels