Skip to content

Ray Sorting

Ray sorting is a GPU optimization technique that improves rendering performance by reducing warp divergence.

Warp Divergence Problem

On GPUs, 32 threads within the same warp execute the same instruction. When rays hit different objects, branching occurs:

cpp
// Code causing warp divergence
if (ray hits sphere A) {
    shade_sphere_A();
} else if (ray hits sphere B) {
    shade_sphere_B();
} else {
    // ...
}

Sorting Strategy

Sort rays by hit object ID so adjacent rays hit the same object:

cpp
// Sort rays by object_id
thrust::sort_by_key(
    thrust::device,
    object_ids,           // keys: object IDs
    object_ids + num_rays,
    ray_indices           // values: ray indices
);

Performance Impact

SceneUnsorted (ms)Sorted (ms)Improvement
Demo453620%
Random1207240%
Complex35021040%

Implementation

cpp
void ray_sort(
    int* object_ids,
    int* ray_indices,
    int num_rays,
    cudaStream_t stream
) {
    // Initialize ray indices
    thrust::sequence(
        thrust::cuda::par.on(stream),
        ray_indices,
        ray_indices + num_rays
    );

    // Sort by object ID
    thrust::sort_by_key(
        thrust::cuda::par.on(stream),
        object_ids,
        object_ids + num_rays,
        ray_indices
    );
}

Applicability

  • Applicable: Phong shading + single sample mode
  • Not Applicable: Path tracing (ray directions change randomly)

How It Works

  1. Generate Keys: For each ray, create a sort key containing the object ID it hit
  2. Sort: Use Thrust's sort_by_key to group rays by object ID
  3. Shade: Process sorted rays - consecutive rays now hit the same object

Warp Efficiency Measurement

cpp
__device__ float compute_warp_efficiency(int* object_ids, int num_rays) {
    int same_object_count = 0;
    for (int i = 0; i < num_rays; i += 32) {
        int first_obj = object_ids[i];
        for (int j = 1; j < 32 && i + j < num_rays; ++j) {
            if (object_ids[i + j] == first_obj) {
                same_object_count++;
            }
        }
    }
    return (float)same_object_count / (num_rays - 32);
}

References

  • [Hoberock 2010] "Thrust: A Parallel Algorithms Library"

Technical Whitepaper · Built with VitePress