FMA3FMA4
FMA3FMA4 is a shorthand reference to two related, but distinct, fused multiply-add instruction formats used on x86 CPUs: FMA3 and FMA4. Both execute a multiply and an addition as a single operation with a single rounding step, which can improve numerical accuracy and performance compared to separate multiply and add instructions. The two formats differ in how many operands they use and how the result is written back to registers.
FMA3 refers to the three-operand fused multiply-add form. In FMA3, the operation computes a*b + c in
FMA4 is a four-operand variant that explicitly separates the destination from the input operands. Its canonical
In practice, most current software targets FMA3 capabilities, using compiler intrinsics or auto-vectorization to emit FMA3