Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[1.1.0] - 2026-04-30

Added

Runtime Foundation v2

  • Multi-input pipeline execution: Correct dependency routing for fork-join and merge topologies
  • Operator workspace lifecycle: initialize(), shutdown(), getWorkspaceRequirements() hooks
  • Stream-aware memory allocation: Device allocation/free with CUDA stream parameter
  • Profiling context propagation: Runtime execution context carries profiling state

Throughput Engine v2

  • Async device allocator mode controls: Support for cudaMallocAsync when available
  • DAG scheduler graph capture/replay state: CUDA graph optimization hooks
  • Real batch execution path: executeBatch() with proper metadata and invocation semantics
  • Benchmark pipeline example: examples/benchmark_pipeline.cpp

Ecosystem Extensions v2

  • CV-CUDA operator surface: Optional CvcudaResizeOperator (dependency-gated)
  • TensorRT inference operator surface: Optional TensorRtInferenceOperator (dependency-gated)
  • GStreamer/DeepStream bridge surface: Optional integration (dependency-gated)
  • Backend capability registry: Query available ecosystem backends at runtime

Fixed

  • Memory manager deadlock prevention: Replace nested lock_guard with scoped_lock
  • Null pointer checks for malloc: Proper error handling on allocation failures
  • CUDA error handling improvements: Consistent error checking across operators

Changed

  • Enhanced operator interface: Execution context now includes workspace and profiling info
  • Improved memory pool: Stream-aware allocation for better performance
  • Better build configuration: Optional backend integration via CMake flags

[1.0.0] - 2026-04-23

Added

  • Initial implementation of Mini-ImagePipe framework
  • Core components:
    • MemoryManager: Pinned memory pool with best-fit allocation
    • TaskGraph: DAG topology management with cycle detection
    • DAGScheduler: Multi-stream concurrent execution
    • Pipeline: End-to-end pipeline builder
  • Operators:
    • GaussianBlurOperator: Separable filter with shared memory optimization
    • SobelOperator: Edge detection with gradient magnitude
    • ResizeOperator: Bilinear and nearest-neighbor interpolation
    • ColorConvertOperator: RGB/Gray/BGR conversions
  • Testing: Property-based testing with 100 iterations per test
  • Documentation: Bilingual (EN/ZH-CN) README and API docs
  • CI/CD: GitHub Actions with clang-format check and CUDA build