Ray Sorting

Ray sorting is a GPU optimization technique that improves rendering performance by reducing warp divergence.

Warp Divergence Problem

On GPUs, 32 threads within the same warp execute the same instruction. When rays hit different objects, branching occurs:

cpp

// Code causing warp divergence
if (ray hits sphere A) {
    shade_sphere_A();
} else if (ray hits sphere B) {
    shade_sphere_B();
} else {
    // ...
}

Sorting Strategy

Sort rays by hit object ID so adjacent rays hit the same object:

cpp

// Sort rays by object_id
thrust::sort_by_key(
    thrust::device,
    object_ids,           // keys: object IDs
    object_ids + num_rays,
    ray_indices           // values: ray indices
);

Performance Impact

Scene	Unsorted (ms)	Sorted (ms)	Improvement
Demo	45	36	20%
Random	120	72	40%
Complex	350	210	40%

Implementation

cpp

void ray_sort(
    int* object_ids,
    int* ray_indices,
    int num_rays,
    cudaStream_t stream
) {
    // Initialize ray indices
    thrust::sequence(
        thrust::cuda::par.on(stream),
        ray_indices,
        ray_indices + num_rays
    );

    // Sort by object ID
    thrust::sort_by_key(
        thrust::cuda::par.on(stream),
        object_ids,
        object_ids + num_rays,
        ray_indices
    );
}

Applicability

Applicable: Phong shading + single sample mode
Not Applicable: Path tracing (ray directions change randomly)

How It Works

Generate Keys: For each ray, create a sort key containing the object ID it hit
Sort: Use Thrust's sort_by_key to group rays by object ID
Shade: Process sorted rays - consecutive rays now hit the same object

Warp Efficiency Measurement

cpp

__device__ float compute_warp_efficiency(int* object_ids, int num_rays) {
    int same_object_count = 0;
    for (int i = 0; i < num_rays; i += 32) {
        int first_obj = object_ids[i];
        for (int j = 1; j < 32 && i + j < num_rays; ++j) {
            if (object_ids[i + j] == first_obj) {
                same_object_count++;
            }
        }
    }
    return (float)same_object_count / (num_rays - 32);
}

References

[Hoberock 2010] "Thrust: A Parallel Algorithms Library"

Ray Sorting ​

Warp Divergence Problem ​

Sorting Strategy ​

Performance Impact ​

Implementation ​

Applicability ​

How It Works ​

Warp Efficiency Measurement ​

References ​