🇨🇳 简体中文

Architecture

System architecture, core algorithms, and design decisions.

Table of Contents

  1. System Architecture
  2. Sparse Matrix Formats
    1. CSR (Compressed Sparse Row)
    2. ELL (ELLPACK)
  3. Kernel Selection Strategy
    1. Kernel Comparison
  4. Design Decisions
    1. 1. RAII Resource Management
    2. 2. Execution Context
    3. 3. Error Handling

System Architecture

1
2
3
4
5
6
7
8
9
10
11
12
13
┌─────────────────────────────────────────────────────────┐
│                    Application Layer                     │
│   PageRank  │  Iterative  │  Graph NNs  │  Scientific  │
├─────────────────────────────────────────────────────────┤
│                       API Layer                          │
│   spmv_csr  │  spmv_ell  │  benchmark  │  pagerank    │
├─────────────────────────────────────────────────────────┤
│                      Kernel Layer                        │
│  Scalar CSR │ Vector CSR │ Merge Path │  ELL Kernel   │
├─────────────────────────────────────────────────────────┤
│                      Storage Layer                       │
│              CSR Matrix      │      ELL Matrix         │
└─────────────────────────────────────────────────────────┘

Design Principles:

  • Layered Architecture: Clear separation of storage, compute, and application layers
  • Strategy Pattern: Pluggable kernel selection strategies
  • RAII Management: Automatic resource lifecycle management

Sparse Matrix Formats

CSR (Compressed Sparse Row)

1
2
3
4
5
6
Matrix:            CSR Storage:
┌─────┬─────┐     values:      [1, 2, 3, 4, 5]
│ 1 0 2 │     col_indices: [0, 2, 1, 2, 3]
│ 0 3 4 │ =>  row_ptrs:    [0, 2, 4, 5]
│ 0 0 5 │     
└─────┴─────┘     row_ptrs[i] indicates row i's start

Characteristics:

  • General format, efficient storage
  • Suitable for irregular sparsity patterns
  • Efficient row-wise traversal

ELL (ELLPACK)

1
2
3
4
5
6
Matrix:            ELL Storage (column-major):
┌─────┬─────┐     values:      [1, 3, 5, 2, 4, 0]
│ 1 0 2 │     col_indices: [0, 1, 3, 2, 2, -]
│ 0 3 4 │     
│ 0 0 5 │     Padded to max row length for coalesced access
└─────┴─────┘

Characteristics:

  • Column-major storage, coalesced access
  • Suitable for uniform row lengths
  • Best performance on GPU

Kernel Selection Strategy

1
2
3
4
5
6
7
8
9
10
11
12
Input: CSR Matrix
       │
       ├── avg_nnz_per_row < 4 ──→ Scalar CSR
       │                            (1 thread/row, minimal overhead)
       │
       └── avg_nnz_per_row >= 4
               │
               ├── skewness < 10 ──→ Vector CSR
               │                      (Warp cooperation, balanced)
               │
               └── skewness >= 10 ──→ Merge Path
                                      (Perfect load balancing)

Kernel Comparison

Kernel Threading Best For Bandwidth
Scalar CSR 1 thread/row Very sparse Medium
Vector CSR 1 warp/row Uniform dist High
Merge Path Dynamic High skewness Highest
ELL Column-parallel Uniform rows Extreme

Design Decisions

1. RAII Resource Management

1
2
3
4
5
6
7
// Automatic lifecycle management, prevents memory leaks
class CudaBuffer {
public:
    explicit CudaBuffer(size_t n) { cudaMalloc(&ptr_, n * sizeof(T)); }
    ~CudaBuffer() { cudaFree(ptr_); }
    // Disable copy, enable move
};

2. Execution Context

1
2
3
4
5
6
// Cache texture objects to avoid recreation
SpMVExecutionContext ctx;
for (int i = 0; i < n_iter; i++) {
    spmv_csr(csr, d_x, d_y, &config, n, &ctx);
    // Texture objects are reused
}

3. Error Handling

1
2
3
4
5
6
7
// Semantic error codes
enum class SpMVError {
    SUCCESS = 0,
    INVALID_DIMENSION = -1,
    CUDA_MALLOC = -2,
    // ...
};

More details in specs/ directory