GPU SpMV
High-Performance CUDA Sparse Matrix-Vector Multiplication
Intelligent Kernel Selection · 70%+ Bandwidth · Production Ready
70%+ Bandwidth
CUDA 11.0+
MIT License
C++17
example.cpp
#include <spmv/spmv.h>
int main() {
// Create sparse matrix
CSRMatrix* csr = csr_create(10000, 10000, 500000);
csr_from_dense(csr, data, 10000, 10000);
csr_to_gpu(csr);
// Auto-select optimal kernel and execute
SpMVConfig config = spmv_auto_config(csr);
SpMVResult result = spmv_csr(csr, d_x, d_y, &config, n);
// 70%+ bandwidth utilization
printf("Bandwidth: %.1f%%\n",
result.bandwidth_utilization * 100);
}Key Features
Extreme Performance
- 4 optimized kernels with intelligent selection
- Up to 70%+ theoretical bandwidth
- Merge Path for perfect load balancing
- ELL format with coalesced memory access
Performance
| Matrix Size | Non-zeros | Kernel | Bandwidth |
|---|---|---|---|
| 10K × 10K | 500K | Vector CSR | 70.2% |
| 100K × 100K | 5M | Merge Path | 71.5% |
| 1M × 1M | 50M | Merge Path | 70.8% |
Benchmarks: NVIDIA RTX 3090 (Ampere, 936 GB/s)
View detailed benchmarks and optimization guides
View Performance →Quick Start
Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
┌─────────────────────────────────────────────────────────┐
│ Application Layer │
│ PageRank │ Iterative │ Graph NNs │ Scientific │
├─────────────────────────────────────────────────────────┤
│ API Layer │
│ spmv_csr │ spmv_ell │ benchmark │ pagerank │
├─────────────────────────────────────────────────────────┤
│ Kernel Layer │
│ Scalar CSR │ Vector CSR │ Merge Path │ ELL Kernel │
├─────────────────────────────────────────────────────────┤
│ Storage Layer │
│ CSR Matrix │ ELL Matrix │
└─────────────────────────────────────────────────────────┘
Use Cases
🕸️ Graph Algorithms
PageRank, shortest path, community detection
🔬 Scientific Computing
Finite element analysis, CFD
🤖 Machine Learning
Sparse neural networks, recommendations
📊 Data Analytics
Matrix factorization, eigenvalue computation
GPU SpMV © 2024-2026 LessUp | MIT License