Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[1.1.0] - 2026-04-30

Multi-input pipeline execution: Correct dependency routing for fork-join and merge topologies
Operator workspace lifecycle: initialize(), shutdown(), getWorkspaceRequirements() hooks
Stream-aware memory allocation: Device allocation/free with CUDA stream parameter
Profiling context propagation: Runtime execution context carries profiling state

Async device allocator mode controls: Support for cudaMallocAsync when available
DAG scheduler graph capture/replay state: CUDA graph optimization hooks
Real batch execution path: executeBatch() with proper metadata and invocation semantics
Benchmark pipeline example: examples/benchmark_pipeline.cpp

CV-CUDA operator surface: Optional CvcudaResizeOperator (dependency-gated)
TensorRT inference operator surface: Optional TensorRtInferenceOperator (dependency-gated)
GStreamer/DeepStream bridge surface: Optional integration (dependency-gated)
Backend capability registry: Query available ecosystem backends at runtime

Memory manager deadlock prevention: Replace nested lock_guard with scoped_lock
Null pointer checks for malloc: Proper error handling on allocation failures
CUDA error handling improvements: Consistent error checking across operators

Enhanced operator interface: Execution context now includes workspace and profiling info
Improved memory pool: Stream-aware allocation for better performance
Better build configuration: Optional backend integration via CMake flags

Initial implementation of Mini-ImagePipe framework
Core components:
- MemoryManager: Pinned memory pool with best-fit allocation
- TaskGraph: DAG topology management with cycle detection
- DAGScheduler: Multi-stream concurrent execution
- Pipeline: End-to-end pipeline builder
Operators:
- GaussianBlurOperator: Separable filter with shared memory optimization
- SobelOperator: Edge detection with gradient magnitude
- ResizeOperator: Bilinear and nearest-neighbor interpolation
- ColorConvertOperator: RGB/Gray/BGR conversions
Testing: Property-based testing with 100 iterations per test
Documentation: Bilingual (EN/ZH-CN) README and API docs
CI/CD: GitHub Actions with clang-format check and CUDA build