Architecture
This page explains the internal architecture and design of HTS.
High-Level Architecture
┌─────────────────────────────────────────────────┐
│ User Application │
└────────────────┬────────────────────────────────┘
│
┌────────────────▼────────────────────────────────┐
│ TaskGraph Builder API │
│ - Task creation and configuration │
│ - Dependency management │
│ - Validation and optimization │
└────────────────┬────────────────────────────────┘
│
┌────────────────▼────────────────────────────────┐
│ Scheduler Core │
│ - TaskGraph management │
│ - Dependency tracking │
│ - Ready queue management │
│ - Scheduling policy selection │
└────────────────┬────────────────────────────────┘
│
┌────────────────▼────────────────────────────────┐
│ Execution Engine │
│ ┌──────────────┐ ┌──────────────────────┐ │
│ │ CPU Thread │ │ CUDA Streams │ │
│ │ Pool │ │ Manager │ │
│ └──────────────┘ └──────────────────────┘ │
└────────────────┬────────────────────────────────┘
│
┌────────────────▼────────────────────────────────┐
│ Memory Pool (GPU) │
│ - Buddy system allocator │
│ - O(log n) allocation │
│ - Defragmentation support │
└──────────────────────────────────────────────────┘2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Core Components
1. TaskGraph
The TaskGraph class manages the DAG (Directed Acyclic Graph) of tasks:
- Task Storage: Maintains all tasks with their configurations
- Dependency Tracking: Tracks predecessor/successor relationships
- Topological Sorting: Provides execution order
- Cycle Detection: Validates DAG structure
- Ready Queue: Identifies tasks ready for execution
Key Files:
include/hts/task_graph.hppsrc/core/task_graph.cpp
2. Task and TaskContext
Each task is represented by a Task object with an associated TaskContext:
Task Properties:
- id: uint64_t // Unique identifier
- name: string // Human-readable name
- device_type: DeviceType // CPU or GPU
- priority: int // Scheduling priority
- status: TaskStatus // Current execution state
- retry_policy: RetryPolicy // Failure handling2
3
4
5
6
TaskContext: Provides runtime information and utilities to task functions:
- get_task_id()
- get_device_type()
- get_execution_time()
- get_retry_count()2
3
4
Key Files:
include/hts/task.hppinclude/hts/task_context.hppsrc/core/task.cpp
3. Scheduler
The Scheduler orchestrates the entire execution:
Responsibilities:
- Initialize and validate TaskGraph
- Maintain execution state
- Select ready tasks based on policy
- Dispatch tasks to appropriate executors
- Track completion and handle failures
- Collect profiling information
Execution Flow:
init(&graph)- Validate and prepareexecute()- Start execution- Policy selects ready tasks
- Tasks dispatched to CPU threads or GPU streams
wait_for_completion()- Block until done- Collect stats and profiling data
Key Files:
include/hts/scheduler.hppsrc/cuda/scheduler.cu
4. Scheduling Policies
HTS uses a pluggable policy architecture:
class SchedulingPolicy {
virtual Task* select_next(
const std::vector<Task*>& ready_queue
) = 0;
};2
3
4
5
Built-in Policies:
| Policy | Strategy | Use Case |
|---|---|---|
GPUPriorityPolicy | Prefer GPU tasks | GPU-heavy workloads |
CPUPriorityPolicy | Prefer CPU tasks | CPU preprocessing |
RoundRobinPolicy | Alternate CPU/GPU | Balanced workloads |
LoadBasedPolicy | Select by current load | Dynamic workloads |
Key Files:
include/hts/scheduling_policy.hpp
5. Memory Pool
GPU memory management uses a buddy system allocator:
Features:
- Eliminates
cudaMalloc/cudaFreeoverhead - O(log n) allocation time
- Automatic defragmentation
- Configurable pool size
Allocation Flow:
- Task requests GPU memory
- MemoryPool finds suitable block
- Splits blocks if needed (buddy system)
- Returns pointer
- On free, merges with buddy if possible
Key Files:
include/hts/memory_pool.hppsrc/cuda/memory_pool.cu
6. Stream Manager
Manages CUDA streams for concurrent GPU execution:
Capabilities:
- Create and manage multiple streams
- Stream priority support
- Automatic stream reuse
- Synchronization primitives
Key Files:
include/hts/stream_manager.hppsrc/cuda/stream_manager.cu
7. Execution Engine
Dispatches tasks to CPU threads or GPU streams:
CPU Execution:
- Thread pool for parallel execution
- Work-stealing support
- Affinity configuration
GPU Execution:
- CUDA stream management
- Kernel launch coordination
- Memory transfer handling
Key Files:
include/hts/execution_engine.hppsrc/cuda/execution_engine.cu
8. Profiler
Built-in performance monitoring:
Metrics Collected:
- Task execution times
- Device utilization
- Memory allocation patterns
- Dependency wait times
- Parallelism metrics
Export:
- JSON format
- Chrome tracing format
- CSV format
Key Files:
include/hts/profiler.hpp
Design Principles
Zero-Overhead Abstraction
HTS follows the C++ principle of "you don't pay for what you don't use":
- No virtual calls in hot paths (when not using polymorphic features)
- Compile-time device type selection when possible
- Inline functions for simple operations
- Template metaprogramming for type safety
Lock-Free Where Possible
Critical paths use lock-free data structures:
- Atomic operations for status updates
- Lock-free queues for ready tasks
- Compare-and-swap for state transitions
Error Resilience
- Comprehensive error codes (see
types.hpp) - Retry policies for transient failures
- Graceful degradation on errors
- Detailed error messages with context
Threading Model
Main Thread
│
├──► Scheduler Thread
│ │
│ ├──► CPU Thread Pool (8 threads)
│ │ ├──► Worker Thread 1
│ │ ├──► Worker Thread 2
│ │ └──► ...
│ │
│ └──► GPU Streams (4 streams)
│ ├──► Stream 0
│ ├──► Stream 1
│ └──► ...
│
└──► Profiler Thread (optional)2
3
4
5
6
7
8
9
10
11
12
13
14
15
Next Steps
- Task Graph — Deep dive into DAG management
- Scheduling — Scheduling policies in detail
- Memory — Memory pool implementation
- API Reference — Complete API documentation