Project Status
CuFlash-Attn is maintained as a stable v0.3.0 reference implementation for learning, auditing, and lightweight integration.
What stays in scope
- CUDA C++ FlashAttention from scratch
- Forward and backward passes for
floatandhalf - Supported
head_dimvalues:32,64,128 - Public C++ API and C ABI for
ctypes-style integration - Bilingual technical documentation and GitHub Pages publishing
Maintenance posture
This repository now prefers:
- clarity over process
- deletion over framework sprawl
- stable behavior over speculative features
- one canonical source per topic
Canonical references
Validation boundaries
- Full CUDA builds require a working toolkit and
nvcc - GPU tests may be unavailable on documentation-only environments
- Docs and workflow cleanup should still keep the docs build and repository layout coherent