Home

cudaMallocHost

cudaMallocHost is a CUDA runtime API function that allocates page-locked (also known as pinned) host memory. The allocation enables faster host-to-device and device-to-host data transfers compared with pageable memory, and it can be used with asynchronous copies when combined with streams. The memory remains resident in physical RAM and is not swapped out by the operating system, which reduces transfer latency and can improve bandwidth.

The typical usage is cudaError_t cudaMallocHost(void** ptr, size_t size). On success, *ptr points to a host memory

Pinned memory allocated by cudaMallocHost is suitable for use as the source or destination in host-device transfers

Common considerations include the higher cost of pinning memory, limited availability of pinned memory, and potential

---

region
of
at
least
size
bytes;
on
failure,
a
CUDA
error
code
such
as
cudaErrorMemoryAllocation
is
returned.
The
allocated
memory
should
be
freed
with
cudaFreeHost(ptr)
when
it
is
no
longer
needed.
Errors
should
be
checked
and
handled
in
accordance
with
the
CUDA
runtime
API.
performed
with
cudaMemcpy
or
cudaMemcpyAsync.
Transfers
can
be
overlapped
with
computation
when
using
streams,
which
can
improve
overall
throughput.
Note
that
kernels
running
on
the
device
typically
cannot
directly
dereference
host
memory
allocated
by
cudaMallocHost
unless
the
memory
is
specifically
allocated
with
mapping
capabilities
(for
example,
via
cudaHostAlloc
with
appropriate
flags).
Therefore,
cudaMallocHost
memory
is
primarily
used
for
efficient
data
transfer
rather
than
direct
device
access.
fragmentation
from
frequent
allocations
and
deallocations.
Developers
should
reuse
allocated
pinned
memory
when
possible
and
release
it
promptly
with
cudaFreeHost.
Related
functions
include
cudaHostAlloc
and
cudaFreeHost.