Home

rocBLAS

rocBLAS is AMD’s high-performance Basic Linear Algebra Subprograms (BLAS) library for the ROCm software platform. It provides accelerated implementations of BLAS Level 1, 2, and 3 routines for AMD GPUs in a HIP-based environment, with a focus on enabling fast matrix and vector operations for scientific computing, data analytics, and machine learning workloads on AMD hardware.

The library uses a handle-based API. Applications create a rocblas_handle, initialize it, and call functions prefixed

rocBLAS is a core component of ROCm, optimized for AMD GPUs across generations and designed to interoperate

with
rocblas_
to
perform
operations.
Example
routines
include
rocblas_sgemm,
rocblas_dgemm,
rocblas_cgemm,
and
rocblas_zgemm,
which
implement
single-
and
double-precision
real
and
complex
matrix-matrix
multiplication.
Batched
and
strided_batched
variants
such
as
rocblas_sgemm_strided_batched
enable
processing
many
matrices
efficiently.
Other
routines
cover
vectors
and
matrices
(for
example,
dot,
axpy,
scal,
gemv).
rocBLAS
supports
multiple
data
types,
including
float,
double,
complex
float,
complex
double,
and,
on
supported
hardware,
half
precision.
Computations
use
device
pointers,
and
a
HIP
stream
can
be
attached
to
a
handle
via
rocblas_set_stream.
with
other
ROCm
libraries
and
tools.
Its
API
and
usage
resemble
cuBLAS
in
naming
and
structure
to
facilitate
porting
from
CUDA-based
code,
while
maintaining
architecture-specific
optimizations.
The
library
is
maintained
by
AMD
and
distributed
as
part
of
the
open-source
ROCm
stack,
serving
a
wide
range
of
high-performance
and
scientific
computing
applications
that
require
efficient
BLAS
routines
on
AMD
hardware.