Skip to content

DAG Scheduling

Detailed explanation of the DAG-based task scheduling algorithm.

Overview

Mini-ImagePipe uses a Directed Acyclic Graph (DAG) to represent operator dependencies. The scheduler automatically determines execution order and parallelism.

Kahn's Algorithm

Our topological sort uses Kahn's algorithm (1962):

1. Compute in-degree for all nodes
2. Add all nodes with in-degree 0 to queue
3. While queue is not empty:
   a. Remove node from queue
   b. Add to sorted list
   c. For each neighbor, decrement in-degree
   d. If in-degree becomes 0, add to queue
4. If sorted list has all nodes → valid DAG
5. Otherwise → cycle detected

Implementation

cpp
std::vector<int> TaskGraph::getTopologicalOrder() const {
    std::vector<int> inDegree(nodes_.size(), 0);
    for (const auto& node : nodes_) {
        inDegree[node.id] = static_cast<int>(node.dependencies.size());
    }

    std::queue<int> q;
    for (size_t i = 0; i < nodes_.size(); ++i) {
        if (inDegree[i] == 0) {
            q.push(static_cast<int>(i));
        }
    }

    std::vector<int> order;
    while (!q.empty()) {
        int current = q.front();
        q.pop();
        order.push_back(current);

        for (int dependent : nodes_[current].dependents) {
            if (--inDegree[dependent] == 0) {
                q.push(dependent);
            }
        }
    }

    return order.size() == nodes_.size() ? order : std::vector<int>{};
}

Cycle Detection

DFS-based cycle detection validates the graph before execution:

cpp
bool hasCycle = !graph.validate();
if (hasCycle) {
    return cudaErrorInvalidValue;
}

Level-Based Execution

Tasks are organized into levels based on dependencies:

Level 0:  [Source A]  [Source B]      ← Execute in parallel
Level 1:  [Process C]                  ← Waits for A
Level 2:  [Process D]                  ← Waits for B, C
Level 3:  [Sink E]                     ← Waits for D

Multi-Stream Execution

Stream Assignment Strategy

  1. Tasks at the same level are distributed across available streams
  2. Round-robin assignment for load balancing
  3. CUDA events synchronize cross-stream dependencies

Error Propagation

When a task fails:

  1. Task state → FAILED
  2. Error callback invoked
  3. All downstream tasks marked FAILED (skipped)
  4. Independent tasks continue execution

References

  • Kahn, A. B. (1962). "Topological sorting of large networks." Commun. ACM.
  • Cormen, T. H., et al. (2009). Introduction to Algorithms. MIT Press.

Released under the MIT License.