Ray Sorting
Ray sorting is a GPU optimization technique that improves rendering performance by reducing warp divergence.
Warp Divergence Problem
On GPUs, 32 threads within the same warp execute the same instruction. When rays hit different objects, branching occurs:
cpp
// Code causing warp divergence
if (ray hits sphere A) {
shade_sphere_A();
} else if (ray hits sphere B) {
shade_sphere_B();
} else {
// ...
}Sorting Strategy
Sort rays by hit object ID so adjacent rays hit the same object:
cpp
// Sort rays by object_id
thrust::sort_by_key(
thrust::device,
object_ids, // keys: object IDs
object_ids + num_rays,
ray_indices // values: ray indices
);Performance Impact
| Scene | Unsorted (ms) | Sorted (ms) | Improvement |
|---|---|---|---|
| Demo | 45 | 36 | 20% |
| Random | 120 | 72 | 40% |
| Complex | 350 | 210 | 40% |
Implementation
cpp
void ray_sort(
int* object_ids,
int* ray_indices,
int num_rays,
cudaStream_t stream
) {
// Initialize ray indices
thrust::sequence(
thrust::cuda::par.on(stream),
ray_indices,
ray_indices + num_rays
);
// Sort by object ID
thrust::sort_by_key(
thrust::cuda::par.on(stream),
object_ids,
object_ids + num_rays,
ray_indices
);
}Applicability
- Applicable: Phong shading + single sample mode
- Not Applicable: Path tracing (ray directions change randomly)
How It Works
- Generate Keys: For each ray, create a sort key containing the object ID it hit
- Sort: Use Thrust's
sort_by_keyto group rays by object ID - Shade: Process sorted rays - consecutive rays now hit the same object
Warp Efficiency Measurement
cpp
__device__ float compute_warp_efficiency(int* object_ids, int num_rays) {
int same_object_count = 0;
for (int i = 0; i < num_rays; i += 32) {
int first_obj = object_ids[i];
for (int j = 1; j < 32 && i + j < num_rays; ++j) {
if (object_ids[i + j] == first_obj) {
same_object_count++;
}
}
}
return (float)same_object_count / (num_rays - 32);
}References
- [Hoberock 2010] "Thrust: A Parallel Algorithms Library"