Home

CUDA

CUDA stands for Compute Unified Device Architecture. It is a parallel computing platform and programming model developed by NVIDIA that enables developers to use NVIDIA GPUs for general-purpose computing (GPGPU). CUDA provides a software abstraction for writing kernels that run on the GPU and manage data transfers between host memory and device memory.

The CUDA platform includes a compiler, libraries, and tools as part of the CUDA Toolkit. Kernels are

Execution is organized into grids of thread blocks; each thread executes a kernel on a streaming multiprocessor

NVIDIA provides a set of libraries optimized for CUDA, such as cuBLAS for linear algebra, cuFFT for

CUDA is widely used for high-performance computing, scientific simulations, and accelerated machine learning workloads. Adoption is

written
in
languages
such
as
CUDA
C/C++
(and,
in
some
cases,
CUDA
Fortran),
compiled
with
the
nvcc
compiler,
and
launched
from
host
code.
Programs
expose
a
host-device
execution
model
where
the
CPU
coordinates
work
for
many
parallel
GPU
threads.
of
the
GPU.
CUDA
exposes
a
hierarchy
of
memory
types,
including
global,
shared,
constant,
and
texture
memory,
with
mechanisms
for
synchronization
and
asynchronous
execution
through
streams
and
events.
Unified
Memory
simplifies
memory
management
by
allowing
the
system
to
migrate
data
between
host
and
device
as
needed.
fast
Fourier
transforms,
cuDNN
for
deep
neural
networks,
and
NCCL
for
multi-GPU
communication.
The
CUDA
Toolkit
also
includes
profiling
and
debugging
tools
like
Nsight
and
nvprof.
generally
tied
to
NVIDIA
GPUs,
and
portable
alternatives
exist
(e.g.,
OpenCL,
SYCL).
The
platform
continues
to
evolve
with
new
GPU
architectures
and
expanded
tooling.