Examples
Welcome to the TensorCraft-HPC examples! This section provides hands-on tutorials and practical code samples to help you get started with high-performance AI kernel development.
Quick Links
| Example | Description | Difficulty |
|---|---|---|
| GEMM Tutorial | Build GEMM from scratch with progressive optimization | 🟢 Beginner |
| FlashAttention | Memory-efficient attention implementation | 🟡 Intermediate |
| Python Bindings | Use TensorCraft from Python | 🟢 Beginner |
Prerequisites
Before running the examples, make sure you have:
- CUDA Toolkit 11.0+ installed
- CMake 3.18+ for building
- C++17 compatible compiler (GCC 9+, Clang 10+, MSVC 19.28+)
- Python 3.8+ (optional, for Python bindings)
Running the Examples
C++ Examples
bash
# Clone and build
git clone https://github.com/LessUp/modern-ai-kernels.git
cd modern-ai-kernels
# Build with CUDA support
cmake --preset dev
cmake --build --preset dev
# Run an example
./build/dev/examples/gemm_examplePython Examples
bash
# Install Python package
pip install -e .
# Run Python example
python examples/python/gemm_demo.pyLearning Path
We recommend following this order for the best learning experience:
- Start with GEMM Tutorial — Learn the fundamentals of CUDA kernel optimization
- Explore FlashAttention — Understand memory-efficient computing patterns
- Try Python Bindings — Integrate TensorCraft into your Python workflow
Need Help?
- Check the API Reference for detailed documentation
- Browse Learning Resources for more tutorials
- Open an issue on GitHub if you encounter problems