cudaMallocHost

cudaMallocHost is a CUDA runtime API function that allocates page-locked (also known as pinned) host memory. The allocation enables faster host-to-device and device-to-host data transfers compared with pageable memory, and it can be used with asynchronous copies when combined with streams. The memory remains resident in physical RAM and is not swapped out by the operating system, which reduces transfer latency and can improve bandwidth.

The typical usage is cudaError_t cudaMallocHost(void** ptr, size_t size). On success, *ptr points to a host memory

Pinned memory allocated by cudaMallocHost is suitable for use as the source or destination in host-device transfers

Common considerations include the higher cost of pinning memory, limited availability of pinned memory, and potential

---

a

cudaErrorMemoryAllocation

cudaFreeHost(ptr)

cudaMemcpyAsync.