Home

rocblasdgemm

rocblas_dgemm is the double-precision general matrix-matrix multiplication routine in the ROCm library rocBLAS. It computes C = alpha * op(A) * op(B) + beta * C, where A, B, and C reside on the device. The operation op(X) is determined by transA and transB (none, transpose, or conjugate transpose; the latter is equivalent to transpose for real numbers). Following BLAS conventions, matrices are treated as column-major and the leading dimensions lda, ldb, and ldc specify the physical strides between columns.

The API is exposed as a C-style function in rocBLAS. A typical prototype is: rocblas_status rocblas_dgemm(rocblas_handle

Dimensional rules follow op(A) being m-by-k and op(B) being k-by-n, so the result C is m-by-n. Therefore

rocblas_dgemm is used as a building block in dense linear algebra on AMD GPUs. In addition to

handle,
rocblas_operation
transA,
rocblas_operation
transB,
rocblas_int
m,
rocblas_int
n,
rocblas_int
k,
const
double*
alpha,
const
double*
A,
rocblas_int
lda,
const
double*
B,
rocblas_int
ldb,
const
double*
beta,
double*
C,
rocblas_int
ldc).
A,
B,
and
C
point
to
device
memory;
alpha
and
beta
are
scalars
whose
memory
location
is
determined
by
the
current
pointer
mode
of
the
handle
(host
or
device).
The
handle
controls
the
execution
context
and
stream
association.
lda
≥
max(1,
number
of
rows
of
op(A)),
ldb
≥
max(1,
number
of
rows
of
op(B)),
and
ldc
≥
max(1,
m).
the
standard
routine,
rocBLAS
provides
batched
variants
(for
multiple
GEMMs)
and
strided
variants
to
support
higher
throughput
in
applications
requiring
many
small
to
moderate
GEMMs.
It
is
part
of
the
ROCm
software
stack
and
integrates
with
other
rocBLAS
routines
and
the
ROCm
runtime.