Project Status

CuFlash-Attn is maintained as a stable v0.3.0 reference implementation for learning, auditing, and lightweight integration.

What stays in scope

CUDA C++ FlashAttention from scratch
Forward and backward passes for float and half
Supported head_dim values: 32, 64, 128
Public C++ API and C ABI for ctypes-style integration
Bilingual technical documentation and GitHub Pages publishing

Maintenance posture

This repository now prefers:

clarity over process
deletion over framework sprawl
stable behavior over speculative features
one canonical source per topic

Canonical references

Validation boundaries

Full CUDA builds require a working toolkit and nvcc
GPU tests may be unavailable on documentation-only environments
Docs and workflow cleanup should still keep the docs build and repository layout coherent