CFUGEMM
CFUGEMM stands for Cache-Friendly Unified General Matrix-Matrix Multiply. It refers to a design approach and family of algorithms for implementing GEMM with a focus on data locality, cache efficiency, and portability across modern hardware platforms. Rather than a single library, CFUGEMM embodies principles and patterns that guide the construction of dense matrix-matrix multiplication routines in high-performance computing and related fields.
Core ideas of CFUGEMM include tiling or blocking to fit submatrices into fast memory elements, careful loop
In practice, CFUGEMM concepts appear in research papers and in various open-source and proprietary libraries as
See also: GEMM, BLAS, cache-friendly algorithms, tiling, micro-kernels.