Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[1.1.0] - 2026-04-30
Added
Runtime Foundation v2
- Multi-input pipeline execution: Correct dependency routing for fork-join and merge topologies
- Operator workspace lifecycle:
initialize(),shutdown(),getWorkspaceRequirements()hooks - Stream-aware memory allocation: Device allocation/free with CUDA stream parameter
- Profiling context propagation: Runtime execution context carries profiling state
Throughput Engine v2
- Async device allocator mode controls: Support for
cudaMallocAsyncwhen available - DAG scheduler graph capture/replay state: CUDA graph optimization hooks
- Real batch execution path:
executeBatch()with proper metadata and invocation semantics - Benchmark pipeline example:
examples/benchmark_pipeline.cpp
Ecosystem Extensions v2
- CV-CUDA operator surface: Optional
CvcudaResizeOperator(dependency-gated) - TensorRT inference operator surface: Optional
TensorRtInferenceOperator(dependency-gated) - GStreamer/DeepStream bridge surface: Optional integration (dependency-gated)
- Backend capability registry: Query available ecosystem backends at runtime
Fixed
- Memory manager deadlock prevention: Replace nested
lock_guardwithscoped_lock - Null pointer checks for malloc: Proper error handling on allocation failures
- CUDA error handling improvements: Consistent error checking across operators
Changed
- Enhanced operator interface: Execution context now includes workspace and profiling info
- Improved memory pool: Stream-aware allocation for better performance
- Better build configuration: Optional backend integration via CMake flags
[1.0.0] - 2026-04-23
Added
- Initial implementation of Mini-ImagePipe framework
- Core components:
MemoryManager: Pinned memory pool with best-fit allocationTaskGraph: DAG topology management with cycle detectionDAGScheduler: Multi-stream concurrent executionPipeline: End-to-end pipeline builder
- Operators:
GaussianBlurOperator: Separable filter with shared memory optimizationSobelOperator: Edge detection with gradient magnitudeResizeOperator: Bilinear and nearest-neighbor interpolationColorConvertOperator: RGB/Gray/BGR conversions
- Testing: Property-based testing with 100 iterations per test
- Documentation: Bilingual (EN/ZH-CN) README and API docs
- CI/CD: GitHub Actions with clang-format check and CUDA build