Skip to content

Convolution Performance

Detailed benchmarks for convolution operations.

Gaussian Blur

Varying Kernel Size (4K Image)

KernelCPU (ms)GPU (ms)Speedup
3×312.50.431.3×
5×545.21.237.7×
7×778.41.843.6×
9×9110.22.445.9×
15×15120.53.831.7×

Varying Image Size (5×5 Kernel)

ImageCPU (ms)GPU (ms)Speedup
HD3.20.216.0×
FHD10.50.521.0×
4K45.21.237.7×
8K180.44.540.1×

Sobel Edge Detection

ImageCPU (ms)GPU (ms)Speedup
HD8.10.327.0×
FHD18.20.536.4×
4K38.10.942.3×
8K150.23.246.9×

Custom Kernels

For 7×7 custom convolution kernel:

ImageCPU (ms)GPU (ms)Speedup
HD15.20.530.4×
FHD32.41.032.4×
4K65.32.131.1×
8K260.18.231.7×

Optimization Notes

  • Shared memory tiling used for all kernels
  • Optimal performance with kernel sizes ≤ 15
  • Larger kernels use separable convolution when possible

Back to Benchmarks

Benchmark Overview

Released under the MIT License.