Skip to content

Getting Started

This document helps you quickly set up the environment and run your first benchmark.


Requirements

ComponentVersionCheck Command
CUDA Toolkit12.xnvcc --version
CMake3.18+cmake --version
C++ CompilerC++17 supportg++ --version
GPUSM 7.0+nvidia-smi

Step 1: Clone Repository

bash
git clone https://github.com/LessUp/mini-inference-engine.git
cd mini-inference-engine

Step 2: Debug Build + Tests

Use the system GCC 12 / G++ 12 preset when your shell has Conda or another custom C++ toolchain active.

bash
# Configure Debug build with system GCC 12 / G++ 12
cmake --preset gcc-cuda

# Build
cmake --build --preset gcc-cuda

# Run tests
ctest --preset gcc-cuda

Expected Output:

Test project /path/to/mini-inference-engine/build-gcc-cuda
    Start 1: test_config
1/8 Test #1: test_config .....................   Passed    0.01 sec
    Start 2: test_logger
2/8 Test #2: test_logger .....................   Passed    0.01 sec
...
8/8 Test #8: test_gemm .......................   Passed    0.52 sec

100% tests passed, 0 tests failed out of 8

Step 3: Release Build + Benchmark

bash
# Configure Release build with system GCC 12 / G++ 12
cmake --preset release-gcc-cuda

# Build
cmake --build --preset release-gcc-cuda

# Run benchmark
./build-release-gcc-cuda/benchmark

Expected Output:

=== Mini-Inference Engine Benchmark ===
GPU: NVIDIA GeForce RTX 3080
Matrix size: 1024 x 1024

Kernel              Time (ms)   TFLOPS    vs cuBLAS
----------------------------------------------------
Naive               15.23       0.14      10.2%
Tiled               7.61        0.28      20.4%
Coalesced           6.12        0.35      25.3%
Double Buffer       3.85        0.56      40.8%
Register Blocked    1.82        1.19      86.5%
Vectorized          1.71        1.25      91.2%
cuBLAS              1.56        1.37      100.0%

Step 4: Run MNIST Demo

bash
./build-release/mnist_demo

This validates correctness using the inference engine on MNIST dataset.


Common Issues

Q: Cannot find CUDA?

Ensure CUDA_PATH environment variable is set correctly:

bash
export CUDA_PATH=/usr/local/cuda
export PATH=$CUDA_PATH/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_PATH/lib64:$LD_LIBRARY_PATH

Q: Tests are skipped?

If no NVIDIA GPU is available, GPU tests will automatically skip. This is expected behavior.

Q: Build errors?

  1. Check CUDA version compatibility
  2. Check C++ compiler supports C++17
  3. If Conda is active, prefer cmake --preset gcc-cuda

Next Steps

MIT License | CUDA GEMM optimization tutorial