DAG Scheduler API
Reference for DAGScheduler, the execution engine that maps DAG tasks to CUDA streams.
Class definition
cpp
class DAGScheduler {
public:
explicit DAGScheduler(int numStreams = 4);
~DAGScheduler();
cudaError_t execute(TaskGraph& graph);
void setErrorCallback(std::function<void(int taskId, cudaError_t)> cb);
int getNumStreams() const;
int getTaskStream(int taskId) const;
bool hasSynchronization(int fromTask, int toTask) const;
void setGraphExecutionEnabled(bool enabled);
bool isGraphExecutionEnabled() const;
bool didReplayLastGraph() const;
bool hasCapturedGraph() const;
};Core methods
DAGScheduler(int numStreams = 4)
Creates scheduler-owned CUDA streams used for task execution.
cpp
PipelineConfig config;
config.numStreams = 4;
Pipeline pipeline(config); // Internally constructs DAGScheduler(4)execute(TaskGraph& graph)
Executes tasks in topological order with dependency-aware synchronization.
cpp
cudaError_t err = pipeline.getScheduler().execute(pipeline.getTaskGraph());setErrorCallback(...)
Registers a callback invoked when a task execution fails.
cpp
scheduler.setErrorCallback([](int taskId, cudaError_t err) {
std::cerr << "Task " << taskId << " failed: " << cudaGetErrorString(err) << std::endl;
});Execution model
Stream assignment
- Source tasks are distributed by stream index.
- Dependent tasks prefer streams not used by their dependencies.
- If all streams are occupied by dependencies, one dependency stream is reused.
Dependency synchronization
For dependencies crossing streams, scheduler uses CUDA events:
cpp
cudaEventRecord(event, producerStream);
cudaStreamWaitEvent(consumerStream, event, 0);Failure propagation
When task T fails:
Tis markedFAILED.- Error callback is triggered.
- Downstream dependent tasks are marked
FAILEDrecursively. - Independent branches may continue.
Task states
cpp
enum class TaskState {
PENDING,
READY,
RUNNING,
COMPLETED,
FAILED
};CUDA Graph controls
Scheduler supports optional graph capture/replay for stable workloads:
cpp
pipeline.getScheduler().setGraphExecutionEnabled(true);Use these signals for diagnostics:
hasCapturedGraph()didReplayLastGraph()isGraphExecutionEnabled()
Tuning guidance
| Topology type | Suggested streams |
|---|---|
| Single operator / linear chain | 1-2 |
| Moderate DAG (3-6 operators) | 2-4 |
| Fork-join topology | 4 |
| Complex parallel DAG | 4-8 |
Prefer empirical tuning with benchmark pages and profile traces.