Convolution Performance
Detailed benchmarks for convolution operations.
Gaussian Blur
Varying Kernel Size (4K Image)
| Kernel | CPU (ms) | GPU (ms) | Speedup |
|---|---|---|---|
| 3×3 | 12.5 | 0.4 | 31.3× |
| 5×5 | 45.2 | 1.2 | 37.7× |
| 7×7 | 78.4 | 1.8 | 43.6× |
| 9×9 | 110.2 | 2.4 | 45.9× |
| 15×15 | 120.5 | 3.8 | 31.7× |
Varying Image Size (5×5 Kernel)
| Image | CPU (ms) | GPU (ms) | Speedup |
|---|---|---|---|
| HD | 3.2 | 0.2 | 16.0× |
| FHD | 10.5 | 0.5 | 21.0× |
| 4K | 45.2 | 1.2 | 37.7× |
| 8K | 180.4 | 4.5 | 40.1× |
Sobel Edge Detection
| Image | CPU (ms) | GPU (ms) | Speedup |
|---|---|---|---|
| HD | 8.1 | 0.3 | 27.0× |
| FHD | 18.2 | 0.5 | 36.4× |
| 4K | 38.1 | 0.9 | 42.3× |
| 8K | 150.2 | 3.2 | 46.9× |
Custom Kernels
For 7×7 custom convolution kernel:
| Image | CPU (ms) | GPU (ms) | Speedup |
|---|---|---|---|
| HD | 15.2 | 0.5 | 30.4× |
| FHD | 32.4 | 1.0 | 32.4× |
| 4K | 65.3 | 2.1 | 31.1× |
| 8K | 260.1 | 8.2 | 31.7× |
Optimization Notes
- Shared memory tiling used for all kernels
- Optimal performance with kernel sizes ≤ 15
- Larger kernels use separable convolution when possible