Memory Manager API

Complete reference for memory management.

Class Definition

cpp

class MemoryManager {
public:
    static MemoryManager& getInstance();

    // Pinned host memory
    void* allocatePinned(size_t size);
    void freePinned(void* ptr);

    // Device memory
    void* allocateDevice(size_t size);
    void freeDevice(void* ptr);

    // Asynchronous transfers
    cudaError_t copyToDeviceAsync(void* dst, const void* src, 
                                  size_t size, cudaStream_t stream);
    cudaError_t copyToHostAsync(void* dst, const void* src, 
                                size_t size, cudaStream_t stream);

    // Pool management
    void setPinnedPoolSize(size_t size);
    void setDevicePoolSize(size_t size);

    // Allocator mode (v2)
    void setDeviceAllocatorMode(DeviceAllocatorMode mode);
    DeviceAllocatorMode getRequestedDeviceAllocatorMode() const;
    DeviceAllocatorMode getEffectiveDeviceAllocatorMode() const;
    bool supportsAsyncDeviceAllocator() const;

    // Cleanup
    void shutdown();
};

Methods

Singleton Access

cpp

MemoryManager& mm = MemoryManager::getInstance();

Pinned Memory

Pinned (page-locked) memory enables faster DMA transfers:

cpp

void* h_data = mm.allocatePinned(size);
// ... use memory ...
mm.freePinned(h_data);

Device Memory

cpp

void* d_data = mm.allocateDevice(size);
// ... use memory ...
mm.freeDevice(d_data);

Async Transfers

cpp

cudaStream_t stream;
cudaStreamCreate(&stream);

// Host to Device
mm.copyToDeviceAsync(d_data, h_data, size, stream);

// Device to Host
mm.copyToHostAsync(h_data, d_data, size, stream);

cudaStreamSynchronize(stream);

Pool Configuration

cpp

MemoryManager& mm = MemoryManager::getInstance();

// Configure pool sizes before first allocation
mm.setPinnedPoolSize(128 * 1024 * 1024);  // 128MB
mm.setDevicePoolSize(512 * 1024 * 1024);  // 512MB

Thread Safety

MemoryManager is thread-safe. All methods can be called concurrently from multiple threads.

Best-Fit Allocation

The memory manager uses a best-fit allocation strategy:

Searches for the smallest block that fits the request
Splits larger blocks if necessary
Coalesces adjacent free blocks on deallocation
Minimizes fragmentation over time

Memory Pool Architecture

┌─────────────────────────────────────────────────────────────┐
│                    MemoryManager                             │
├─────────────────────────────────────────────────────────────┤
│  Pinned Memory Pool          │  Device Memory Pool          │
│  ┌─────────────────────┐     │  ┌─────────────────────┐     │
│  │ Block 1 (256KB)     │     │  │ Block 1 (1MB)       │     │
│  │ Block 2 (512KB)     │     │  │ Block 2 (2MB)       │     │
│  │ Block 3 (1MB)       │     │  │ Block 3 (4MB)       │     │
│  │ ...                 │     │  │ ...                 │     │
│  └─────────────────────┘     │  └─────────────────────┘     │
├─────────────────────────────────────────────────────────────┤
│  Benefits:                                                   │
│  • Reuse across pipeline executions                          │
│  • Reduced allocation overhead                               │
│  • Lower fragmentation                                       │
└─────────────────────────────────────────────────────────────┘

v2 Runtime Extensions

Stream-Ordered Allocation (CUDA 11.2+)

cpp

mm.setDeviceAllocatorMode(DeviceAllocatorMode::ASYNC_STREAM_ORDERED);

void* d_data = mm.allocateDevice(size, stream);
mm.freeDevice(d_data, stream);

Async Memory Pools

When available, uses CUDA's native async memory pools:

cpp

// Automatically uses cudaMemPool if supported
mm.setDeviceAllocatorMode(DeviceAllocatorMode::CUDA_ASYNC_POOL);

Memory Manager API ​

Class Definition ​

Methods ​

Singleton Access ​

Pinned Memory ​

Device Memory ​

Async Transfers ​

Pool Configuration ​

Thread Safety ​

Best-Fit Allocation ​

Memory Pool Architecture ​

v2 Runtime Extensions ​

Stream-Ordered Allocation (CUDA 11.2+) ​

Async Memory Pools ​