Architecture Overview
GPU SpMV now keeps the architecture deliberately small: sparse storage, kernel execution, and a narrow public API.
System Architecture
Design Principles
| Principle | Implementation | Benefit |
|---|---|---|
| Layered Architecture | Storage and compute remain separated | Easier maintenance |
| Strategy Selection | Kernel choice based on matrix statistics | Predictable execution |
| RAII Management | CudaBuffer<T> and execution contexts | Safer resource lifetime |
| Semantic Errors | SpMVError and explicit return values | Clear diagnostics |
Core Layers
Storage Layer
- CSR Matrix — general-purpose sparse format
- ELL Matrix — column-major layout for regular sparsity
Kernel Layer
| Kernel | Thread Strategy | Best For | Bandwidth |
|---|---|---|---|
| Scalar CSR | 1 thread/row | Very sparse (nnz/row < 4) | ~40-50% |
| Vector CSR | 1 warp/row | Uniform distribution | ~65-75% |
| Merge Path | Dynamic partitioning | Highly skewed | ~70-80% |
| ELL Kernel | Column parallel | Uniform row lengths | ~80-90% |
API Layer
spmv_csr()— CSR format executionspmv_ell()— ELL format executionspmv_auto_config()— kernel auto-selection
The three most important ideas on this page
- Data flows from sparse storage to a chosen kernel and then to validated output.
- Kernel selection is explicit, driven by
avg_nnz_per_rowand skewness. - Reliability is engineered, not implied, through RAII, semantic errors, and focused tests.