Troubleshooting

Common issues and solutions for Tiny-LLM.

Build Issues

CUDA not found

Error: Could not find CUDA or nvcc not found

Solutions:

bash

# Check CUDA installation
nvcc --version

# Set CUDA path explicitly
cmake .. -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.2

# Or add to PATH
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

CMake version too old

Error: CMake 3.18 or higher is required

Solutions:

bash

# Using pip
pip install --upgrade cmake

# Using snap (Ubuntu)
sudo snap install cmake --classic

# Build from source
curl -L https://cmake.org/files/v3.28/cmake-3.28.0.tar.gz | tar xz
cd cmake-3.28.0 && ./bootstrap && make && sudo make install

C++17 not supported

Error: error: 'auto' in lambda parameter not supported

Solutions:

bash

# Check compiler version
gcc --version  # Should be 9+
clang --version  # Should be 10+

# Specify compiler
cmake .. -DCMAKE_CXX_COMPILER=g++-11

# Or use environment variable
CC=gcc-11 CXX=g++-11 cmake ..

CUDA architecture mismatch

Error: No kernel image is available for execution on the device

Solutions:

bash

# Check your GPU compute capability
nvidia-smi --query-gpu=compute_cap --format=csv

# Build for your specific architecture
cmake .. -DCUDA_ARCH="80"  # For SM 8.0 (A100)
cmake .. -DCUDA_ARCH="86"  # For SM 8.6 (RTX 3090)
cmake .. -DCUDA_ARCH="89"  # For SM 8.9 (RTX 4090)

# Or use native detection
cmake .. -DCUDA_ARCH="native"

Runtime Issues

CUDA out of memory

Error: CUDA out of memory or cudaErrorMemoryAllocation

Solutions:

Reduce batch size

cpp

cache_config.max_batch_size = 1;  // Reduce from 4

Reduce sequence length

cpp

config.max_seq_len = 1024;  // Reduce from 2048

Monitor memory

cpp

size_t free, total;
cudaMemGetInfo(&free, &total);
std::cout << "Free: " << free / 1024 / 1024 << " MB" << std::endl;

Illegal memory access

Error: an illegal memory access was encountered

Possible causes:

Incorrect model file format
Dimension mismatch between model and config
Uninitialized memory

Solutions:

Enable debug mode

bash

cmake .. -DCMAKE_BUILD_TYPE=Debug
CUDA_LAUNCH_BLOCKING=1 ./tiny_llm_demo

Run with cuda-memcheck

bash

cuda-memcheck ./tiny_llm_demo
compute-sanitizer ./tiny_llm_demo

Verify model dimensions

cpp

std::cout << "Config: " << config.hidden_dim 
          << " x " << config.num_layers << std::endl;

Slow generation speed

Possible causes:

Debug build
Not using W8A16 quantization
Incorrect CUDA architecture

Solutions:

Use Release build
bash
```
cmake .. -DCMAKE_BUILD_TYPE=Release
```
1
Verify GPU utilization
bash
```
watch -n 1 nvidia-smi
```
1

Profile the application

bash

nsys profile -o profile ./tiny_llm_demo
nsys-ui profile.qdrep

Performance Issues

Low GPU utilization

Symptom: GPU utilization < 50%

Solutions:

Increase batch size
Check memory bandwidth bound operations
Profile kernels with Nsight Compute

Memory bandwidth bottleneck

Symptom: Decode phase slower than expected

Cause: Attention decode is memory bandwidth bound

Solutions:

Use faster GPU with higher bandwidth
Reduce KV cache size (smaller batch/seq_len)
Enable flash attention (if available)

Model Loading Issues

Invalid model file

Error: Failed to load model: invalid format

Checklist:

[ ] File exists and is readable
[ ] Magic number matches (first 4 bytes)
[ ] Version is supported
[ ] Dimensions match config

Dimension mismatch

Error: Weight dimension mismatch

Solutions:

cpp

// Verify config
std::cout << "vocab_size: " << config.vocab_size << std::endl;
std::cout << "hidden_dim: " << config.hidden_dim << std::endl;
std::cout << "intermediate_dim: " << config.intermediate_dim << std::endl;

Getting Help

Debug Information to Include

When reporting issues, please provide:

System info

bash

nvidia-smi
nvcc --version
cmake --version

Build output

bash

cmake .. 2>&1 | tee cmake.log
make VERBOSE=1 2>&1 | tee build.log

Runtime error

bash

CUDA_LAUNCH_BLOCKING=1 ./tiny_llm_demo 2>&1 | tee runtime.log

Troubleshooting ​

Build Issues ​

CUDA not found ​

CMake version too old ​

C++17 not supported ​

CUDA architecture mismatch ​

Runtime Issues ​

CUDA out of memory ​

Illegal memory access ​

Slow generation speed ​

Performance Issues ​

Low GPU utilization ​

Memory bandwidth bottleneck ​

Model Loading Issues ​

Invalid model file ​

Dimension mismatch ​

Getting Help ​

Debug Information to Include ​

Resources ​

Troubleshooting

Build Issues

CUDA not found

CMake version too old

C++17 not supported

CUDA architecture mismatch

Runtime Issues

CUDA out of memory

Illegal memory access

Slow generation speed

Performance Issues

Low GPU utilization

Memory bandwidth bottleneck

Model Loading Issues

Invalid model file

Dimension mismatch

Getting Help

Debug Information to Include

Resources