Evidence
This section is the proof surface for TensorCraft-HPC.
Performance claims are paired with method, caveats, and source pages.Method Benchmark summaries, methodology notes, and cross-links to references.Source Benchmarks, whitepaper, and references routes.
Performance summary
Performance Benchmarks
Relative performance compared to NVIDIA libraries on A100 80GB (FP16 Tensor Core)
GEMM (FP16)vs cuBLAS
Tensor Core enabledFlashAttentionvs cuDNN
Memory-efficient tilingLayerNormvs cuDNN
Fused kernelConv2Dvs cuDNN
Im2Col optimizationSpMV (CSR)vs cuSPARSE
CSR format88%Average
95%Best
5Kernels
Benchmarks run on A100 80GB, CUDA 12.4, Tensor Core enabled
What belongs here
- Benchmarks for kernel-level results
- Papers and citations for research lineage
- Related resources for ecosystem comparison and further reading