GPU SpMV

High-Performance CUDA Sparse Matrix-Vector Multiplication

Intelligent Kernel Selection · 70%+ Bandwidth · Production Ready

70%+ Bandwidth CUDA 11.0+ MIT License C++17

example.cpp

#include <spmv/spmv.h>

int main() {
    // Create sparse matrix
    CSRMatrix* csr = csr_create(10000, 10000, 500000);
    csr_from_dense(csr, data, 10000, 10000);
    csr_to_gpu(csr);
    
    // Auto-select optimal kernel and execute
    SpMVConfig config = spmv_auto_config(csr);
    SpMVResult result = spmv_csr(csr, d_x, d_y, &config, n);
    
    // 70%+ bandwidth utilization
    printf("Bandwidth: %.1f%%\n", 
           result.bandwidth_utilization * 100);
}

Key Features

🚀

Extreme Performance

4 optimized kernels with intelligent selection
Up to 70%+ theoretical bandwidth
Merge Path for perfect load balancing
ELL format with coalesced memory access

📊

Multi-Format Support

CSR - General sparse matrices
ELL - High-performance uniform matrices
Automatic format conversion
Seamless GPU/CPU switching

🎯

Production Quality

RAII resource management (CudaBuffer)
Semantic error codes (SpMVError)
Cross-platform (Linux/Windows)
100+ test cases coverage

Performance

Matrix Size	Non-zeros	Kernel	Bandwidth
10K × 10K	500K	Vector CSR	70.2%
100K × 100K	5M	Merge Path	71.5%
1M × 1M	50M	Merge Path	70.8%

Benchmarks: NVIDIA RTX 3090 (Ampere, 936 GB/s)

View detailed benchmarks and optimization guides

View Performance →

Quick Start

Installation

git clone https://github.com/LessUp/gpu-spmv.git
cd gpu-spmv
cmake --preset release && cmake --build --preset release

📚 Full Installation Guide Requirements & detailed steps 📝 Code Examples 7 complete examples 📖 API Reference Complete interface docs

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Application Layer                     │
│   PageRank  │ Iterative  │ Graph NNs  │ Scientific     │
├─────────────────────────────────────────────────────────┤
│                       API Layer                          │
│   spmv_csr  │  spmv_ell │ benchmark │   pagerank      │
├─────────────────────────────────────────────────────────┤
│                      Kernel Layer                        │
│  Scalar CSR │ Vector CSR │ Merge Path │  ELL Kernel    │
├─────────────────────────────────────────────────────────┤
│                     Storage Layer                        │
│              CSR Matrix      │      ELL Matrix         │
└─────────────────────────────────────────────────────────┘

View Architecture →

Use Cases

🕸️ Graph Algorithms PageRank, shortest path, community detection

🔬 Scientific Computing Finite element analysis, CFD

🤖 Machine Learning Sparse neural networks, recommendations

📊 Data Analytics Matrix factorization, eigenvalue computation

🇨🇳 查看中文版