ADDED Requirements

Requirement: Async-Ready Device Allocation

The memory manager SHALL expose a configurable device allocator mode so that stream-ordered async allocation can be enabled for throughput-oriented workloads without changing pipeline call sites.

Scenarios

Scenario: Effective allocator mode

WHEN async allocation is requested but not supported by the current CUDA runtime
THEN the memory manager SHALL report the effective fallback allocator mode explicitly

Scenario: Stream-ordered allocation

WHEN async allocation is enabled and supported
THEN device allocations and frees SHALL use stream-ordered allocator APIs

Requirement: Scheduler Graph Replay

The scheduler SHALL support CUDA Graph capture and replay for stable workloads so that repeated pipeline executions can reduce CPU launch overhead.

Scenarios

Scenario: Reusing a captured graph

WHEN the same stable workload executes repeatedly with graph mode enabled
THEN the scheduler SHALL reuse the captured graph instead of recapturing every run

Scenario: Invalidating a captured graph

WHEN workload shape or topology changes
THEN the scheduler SHALL invalidate the previously captured graph before the next replay

Requirement: Fixed-Shape Batch Execution

The pipeline SHALL execute fixed-shape batches as a single runtime batch context so that batch execution is not implemented as a thin loop over single-frame execution.

Scenarios

Scenario: Batch metadata propagation

WHEN executeBatch() is called with a fixed-shape batch
THEN operators SHALL receive batch metadata through ImageBuffer

Scenario: One invocation per node

WHEN the runtime executes a fixed-shape batch
THEN each node SHALL execute once per batch context instead of once per frame

Requirement: Throughput Validation Tooling

The project SHALL provide benchmark and GPU-validation entry points so that throughput-oriented changes can be measured and validated in engineering workflows.

Scenarios

Scenario: Benchmark entry point

WHEN a developer needs to measure throughput
THEN the repository SHALL provide a supported benchmark target or benchmark mode

Scenario: GPU validation path

WHEN CI or local development runs on GPU-capable infrastructure
THEN the project SHALL provide an explicit GPU validation path rather than relying only on best-effort CPU-only execution