Performance Benchmarks

This project does not treat any single benchmark table as universal truth. WebGPU performance depends heavily on browser version, driver quality, GPU architecture, array size, and whether you can reuse buffers across runs.

How to evaluate performance

Use the interactive demo to test the current build on your own machine. Compare:

GPU time - compute work only
Total time - upload, compute, and readback together
CPU time - TypedArray.sort() as a local baseline

What usually matters most

Input size

Small arrays often stay CPU-favorable because buffer transfer overhead dominates. Larger arrays are where GPU sorting becomes interesting.

Reuse

Repeated sorts get better when you reuse the same GPUContext and keep buffers alive between runs.

Algorithm choice

Use case	Better starting point	Why
General reference implementation	`BitonicSorter`	Predictable structure and simpler reasoning
Large `Uint32Array` workloads	`RadixSorter`	Fewer wasted comparisons on integer-heavy data
Small or one-off arrays	CPU sort	Lower setup cost

Benchmark workflow

Start with a small array and confirm correctness.
Increase array size until transfer overhead stops dominating.
Compare GPU-only time with total time; both matter.
Repeat the same run several times to smooth out shader compilation and warm-up effects.

Interpreting results

GPU time faster, total time slower usually means the shader work is fine but transfer/setup cost dominates.
Both GPU and total time faster indicates a good browser/GPU fit for that workload.
Radix slower than Bitonic can happen on smaller arrays where extra passes do not amortize well.

Practical tips

Reuse the same context

const gpu = new GPUContext();
await gpu.initialize();

const sorter = new BitonicSorter(gpu);
await sorter.sort(batchA);
await sorter.sort(batchB);

Preallocate when sizes are predictable

const sorter = new RadixSorter(gpu);
sorter.preallocate(1_000_000);

Measure both correctness and throughput

Enable validation while developing, then disable it when you only want raw throughput measurements.

Run your own benchmark

The repository ships a maintained browser playground specifically for this purpose. Open the interactive demo, choose your workload size, and compare Bitonic, Radix, and CPU timings on the target machine.

Performance Benchmarks ​

How to evaluate performance ​

What usually matters most ​

Input size ​

Reuse ​

Algorithm choice ​

Benchmark workflow ​

Interpreting results ​

Practical tips ​

Reuse the same context ​

Preallocate when sizes are predictable ​

Measure both correctness and throughput ​

Run your own benchmark ​