stridedbatched

Strided batched is a data layout and interface for performing a batch of identical linear algebra operations where the matrices for each batch item are stored in memory with a fixed stride between consecutive items. This pattern is commonly used for batched matrix-matrix multiplications (GEMMs) and other batched BLAS operations on accelerators, notably GPUs. In a strided batched GEMM, for i from 0 to batchCount-1, the operation is C_i = alpha * op(A_i) * op(B_i) + beta * C_i, where A_i, B_i, and C_i are matrices of fixed shapes, and op denotes optional transposition or conjugate transposition.

Typical parameterization (as in cuBLAS GEMM Strided Batched) includes: m, n, k defining the matrix dimensions;

Strided batched contrasts with pointer-based batched methods (arrays of pointers to A_i, B_i, C_i). Strided storage

C

+

i

*

+

i

*

+

i

*

=

*

=

*

=

*

high-performance

GPU-accelerated