ADDED Requirements
Requirement: Async-Ready Device Allocation
The memory manager SHALL expose a configurable device allocator mode so that stream-ordered async allocation can be enabled for throughput-oriented workloads without changing pipeline call sites.
Scenarios
Scenario: Effective allocator mode
- WHEN async allocation is requested but not supported by the current CUDA runtime
- THEN the memory manager SHALL report the effective fallback allocator mode explicitly
Scenario: Stream-ordered allocation
- WHEN async allocation is enabled and supported
- THEN device allocations and frees SHALL use stream-ordered allocator APIs
Requirement: Scheduler Graph Replay
The scheduler SHALL support CUDA Graph capture and replay for stable workloads so that repeated pipeline executions can reduce CPU launch overhead.
Scenarios
Scenario: Reusing a captured graph
- WHEN the same stable workload executes repeatedly with graph mode enabled
- THEN the scheduler SHALL reuse the captured graph instead of recapturing every run
Scenario: Invalidating a captured graph
- WHEN workload shape or topology changes
- THEN the scheduler SHALL invalidate the previously captured graph before the next replay
Requirement: Fixed-Shape Batch Execution
The pipeline SHALL execute fixed-shape batches as a single runtime batch context so that batch execution is not implemented as a thin loop over single-frame execution.
Scenarios
Scenario: Batch metadata propagation
- WHEN
executeBatch()is called with a fixed-shape batch - THEN operators SHALL receive batch metadata through
ImageBuffer
Scenario: One invocation per node
- WHEN the runtime executes a fixed-shape batch
- THEN each node SHALL execute once per batch context instead of once per frame
Requirement: Throughput Validation Tooling
The project SHALL provide benchmark and GPU-validation entry points so that throughput-oriented changes can be measured and validated in engineering workflows.
Scenarios
Scenario: Benchmark entry point
- WHEN a developer needs to measure throughput
- THEN the repository SHALL provide a supported benchmark target or benchmark mode
Scenario: GPU validation path
- WHEN CI or local development runs on GPU-capable infrastructure
- THEN the project SHALL provide an explicit GPU validation path rather than relying only on best-effort CPU-only execution