Skip to content

Architecture Overview

GPU SpMV now keeps the architecture deliberately small: sparse storage, kernel execution, and a narrow public API.

System Architecture

Design Principles

PrincipleImplementationBenefit
Layered ArchitectureStorage and compute remain separatedEasier maintenance
Strategy SelectionKernel choice based on matrix statisticsPredictable execution
RAII ManagementCudaBuffer<T> and execution contextsSafer resource lifetime
Semantic ErrorsSpMVError and explicit return valuesClear diagnostics

Core Layers

Storage Layer

  • CSR Matrix — general-purpose sparse format
  • ELL Matrix — column-major layout for regular sparsity

Kernel Layer

KernelThread StrategyBest ForBandwidth
Scalar CSR1 thread/rowVery sparse (nnz/row < 4)~40-50%
Vector CSR1 warp/rowUniform distribution~65-75%
Merge PathDynamic partitioningHighly skewed~70-80%
ELL KernelColumn parallelUniform row lengths~80-90%

API Layer

  • spmv_csr() — CSR format execution
  • spmv_ell() — ELL format execution
  • spmv_auto_config() — kernel auto-selection

The three most important ideas on this page

  1. Data flows from sparse storage to a chosen kernel and then to validated output.
  2. Kernel selection is explicit, driven by avg_nnz_per_row and skewness.
  3. Reliability is engineered, not implied, through RAII, semantic errors, and focused tests.

MIT License