GPU SpMV

High-Performance CUDA Sparse Matrix-Vector Multiplication

Intelligent Kernel Selection · 70%+ Bandwidth · Production Ready

70%+ Bandwidth CUDA 11.0+ MIT License C++17
example.cpp
#include <spmv/spmv.h>

int main() {
    // Create sparse matrix
    CSRMatrix* csr = csr_create(10000, 10000, 500000);
    csr_from_dense(csr, data, 10000, 10000);
    csr_to_gpu(csr);
    
    // Auto-select optimal kernel and execute
    SpMVConfig config = spmv_auto_config(csr);
    SpMVResult result = spmv_csr(csr, d_x, d_y, &config, n);
    
    // 70%+ bandwidth utilization
    printf("Bandwidth: %.1f%%\n", 
           result.bandwidth_utilization * 100);
}

Key Features

🚀

Extreme Performance

  • 4 optimized kernels with intelligent selection
  • Up to 70%+ theoretical bandwidth
  • Merge Path for perfect load balancing
  • ELL format with coalesced memory access
📊

Multi-Format Support

  • CSR - General sparse matrices
  • ELL - High-performance uniform matrices
  • Automatic format conversion
  • Seamless GPU/CPU switching
🎯

Production Quality

  • RAII resource management (CudaBuffer)
  • Semantic error codes (SpMVError)
  • Cross-platform (Linux/Windows)
  • 100+ test cases coverage

Performance

Matrix Size Non-zeros Kernel Bandwidth
10K × 10K 500K Vector CSR 70.2%
100K × 100K 5M Merge Path 71.5%
1M × 1M 50M Merge Path 70.8%

Benchmarks: NVIDIA RTX 3090 (Ampere, 936 GB/s)

View detailed benchmarks and optimization guides

View Performance →

Quick Start

Installation

git clone https://github.com/LessUp/gpu-spmv.git
cd gpu-spmv
cmake --preset release && cmake --build --preset release

Architecture

1
2
3
4
5
6
7
8
9
10
11
12
13
┌─────────────────────────────────────────────────────────┐
│                    Application Layer                     │
│   PageRank  │ Iterative  │ Graph NNs  │ Scientific     │
├─────────────────────────────────────────────────────────┤
│                       API Layer                          │
│   spmv_csr  │  spmv_ell │ benchmark │   pagerank      │
├─────────────────────────────────────────────────────────┤
│                      Kernel Layer                        │
│  Scalar CSR │ Vector CSR │ Merge Path │  ELL Kernel    │
├─────────────────────────────────────────────────────────┤
│                     Storage Layer                        │
│              CSR Matrix      │      ELL Matrix         │
└─────────────────────────────────────────────────────────┘

View Architecture →


Use Cases

🕸️ Graph Algorithms PageRank, shortest path, community detection
🔬 Scientific Computing Finite element analysis, CFD
🤖 Machine Learning Sparse neural networks, recommendations
📊 Data Analytics Matrix factorization, eigenvalue computation


🇨🇳 查看中文版

GPU SpMV © 2024-2026 LessUp | MIT License