rocblascgemm
rocblas_cgemm is a routine in the ROCm project's rocBLAS library that implements complex single-precision general matrix-matrix multiplication on AMD GPUs. It is used to compute C = alphaop(A)op(B) + betaC, where alpha and beta are complex scalars and A, B, C are matrices of complex single-precision elements. The library supports op(A) and op(B) equal to no transpose, transpose, or conjugate transpose. The routine is exposed through a C API mirroring the standard BLAS interface, requiring a rocblas_handle, pointers to device memory for A, B, and C, and their leading dimensions. The operation is performed on the device using ROCm kernels and benefits from GPU parallelism, memory layout, and ROCm optimizations. In addition to the basic single-matrix version, rocBLAS provides batched and strided_batched variants to process many matrix multiplies in parallel.
Correct usage involves ensuring matrices are stored with the expected leading dimensions, choosing appropriate transposition modes,
Rocblas_cgemm is one of several BLAS routines in ROCm; its real equivalents include sgemm/dgemm and the complex