Skip to content

DAG Scheduler API

Reference for DAGScheduler, the execution engine that maps DAG tasks to CUDA streams.

Class definition

cpp
class DAGScheduler {
public:
    explicit DAGScheduler(int numStreams = 4);
    ~DAGScheduler();

    cudaError_t execute(TaskGraph& graph);
    void setErrorCallback(std::function<void(int taskId, cudaError_t)> cb);

    int getNumStreams() const;
    int getTaskStream(int taskId) const;
    bool hasSynchronization(int fromTask, int toTask) const;

    void setGraphExecutionEnabled(bool enabled);
    bool isGraphExecutionEnabled() const;
    bool didReplayLastGraph() const;
    bool hasCapturedGraph() const;
};

Core methods

DAGScheduler(int numStreams = 4)

Creates scheduler-owned CUDA streams used for task execution.

cpp
PipelineConfig config;
config.numStreams = 4;
Pipeline pipeline(config);  // Internally constructs DAGScheduler(4)

execute(TaskGraph& graph)

Executes tasks in topological order with dependency-aware synchronization.

cpp
cudaError_t err = pipeline.getScheduler().execute(pipeline.getTaskGraph());

setErrorCallback(...)

Registers a callback invoked when a task execution fails.

cpp
scheduler.setErrorCallback([](int taskId, cudaError_t err) {
    std::cerr << "Task " << taskId << " failed: " << cudaGetErrorString(err) << std::endl;
});

Execution model

Stream assignment

  • Source tasks are distributed by stream index.
  • Dependent tasks prefer streams not used by their dependencies.
  • If all streams are occupied by dependencies, one dependency stream is reused.

Dependency synchronization

For dependencies crossing streams, scheduler uses CUDA events:

cpp
cudaEventRecord(event, producerStream);
cudaStreamWaitEvent(consumerStream, event, 0);

Failure propagation

When task T fails:

  1. T is marked FAILED.
  2. Error callback is triggered.
  3. Downstream dependent tasks are marked FAILED recursively.
  4. Independent branches may continue.

Task states

cpp
enum class TaskState {
    PENDING,
    READY,
    RUNNING,
    COMPLETED,
    FAILED
};

CUDA Graph controls

Scheduler supports optional graph capture/replay for stable workloads:

cpp
pipeline.getScheduler().setGraphExecutionEnabled(true);

Use these signals for diagnostics:

  • hasCapturedGraph()
  • didReplayLastGraph()
  • isGraphExecutionEnabled()

Tuning guidance

Topology typeSuggested streams
Single operator / linear chain1-2
Moderate DAG (3-6 operators)2-4
Fork-join topology4
Complex parallel DAG4-8

Prefer empirical tuning with benchmark pages and profile traces.

Released under the MIT License.