Memory Manager API
Complete reference for memory management.
Class Definition
cpp
class MemoryManager {
public:
static MemoryManager& getInstance();
// Pinned host memory
void* allocatePinned(size_t size);
void freePinned(void* ptr);
// Device memory
void* allocateDevice(size_t size);
void freeDevice(void* ptr);
// Asynchronous transfers
cudaError_t copyToDeviceAsync(void* dst, const void* src,
size_t size, cudaStream_t stream);
cudaError_t copyToHostAsync(void* dst, const void* src,
size_t size, cudaStream_t stream);
// Pool management
void setPinnedPoolSize(size_t size);
void setDevicePoolSize(size_t size);
// Allocator mode (v2)
void setDeviceAllocatorMode(DeviceAllocatorMode mode);
DeviceAllocatorMode getRequestedDeviceAllocatorMode() const;
DeviceAllocatorMode getEffectiveDeviceAllocatorMode() const;
bool supportsAsyncDeviceAllocator() const;
// Cleanup
void shutdown();
};Methods
Singleton Access
cpp
MemoryManager& mm = MemoryManager::getInstance();Pinned Memory
Pinned (page-locked) memory enables faster DMA transfers:
cpp
void* h_data = mm.allocatePinned(size);
// ... use memory ...
mm.freePinned(h_data);Device Memory
cpp
void* d_data = mm.allocateDevice(size);
// ... use memory ...
mm.freeDevice(d_data);Async Transfers
cpp
cudaStream_t stream;
cudaStreamCreate(&stream);
// Host to Device
mm.copyToDeviceAsync(d_data, h_data, size, stream);
// Device to Host
mm.copyToHostAsync(h_data, d_data, size, stream);
cudaStreamSynchronize(stream);Pool Configuration
cpp
MemoryManager& mm = MemoryManager::getInstance();
// Configure pool sizes before first allocation
mm.setPinnedPoolSize(128 * 1024 * 1024); // 128MB
mm.setDevicePoolSize(512 * 1024 * 1024); // 512MBThread Safety
MemoryManager is thread-safe. All methods can be called concurrently from multiple threads.
Best-Fit Allocation
The memory manager uses a best-fit allocation strategy:
- Searches for the smallest block that fits the request
- Splits larger blocks if necessary
- Coalesces adjacent free blocks on deallocation
- Minimizes fragmentation over time
Memory Pool Architecture
┌─────────────────────────────────────────────────────────────┐
│ MemoryManager │
├─────────────────────────────────────────────────────────────┤
│ Pinned Memory Pool │ Device Memory Pool │
│ ┌─────────────────────┐ │ ┌─────────────────────┐ │
│ │ Block 1 (256KB) │ │ │ Block 1 (1MB) │ │
│ │ Block 2 (512KB) │ │ │ Block 2 (2MB) │ │
│ │ Block 3 (1MB) │ │ │ Block 3 (4MB) │ │
│ │ ... │ │ │ ... │ │
│ └─────────────────────┘ │ └─────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ Benefits: │
│ • Reuse across pipeline executions │
│ • Reduced allocation overhead │
│ • Lower fragmentation │
└─────────────────────────────────────────────────────────────┘v2 Runtime Extensions
Stream-Ordered Allocation (CUDA 11.2+)
cpp
mm.setDeviceAllocatorMode(DeviceAllocatorMode::ASYNC_STREAM_ORDERED);
void* d_data = mm.allocateDevice(size, stream);
mm.freeDevice(d_data, stream);Async Memory Pools
When available, uses CUDA's native async memory pools:
cpp
// Automatically uses cudaMemPool if supported
mm.setDeviceAllocatorMode(DeviceAllocatorMode::CUDA_ASYNC_POOL);