Home

FMA3FMA4

FMA3FMA4 is a shorthand reference to two related, but distinct, fused multiply-add instruction formats used on x86 CPUs: FMA3 and FMA4. Both execute a multiply and an addition as a single operation with a single rounding step, which can improve numerical accuracy and performance compared to separate multiply and add instructions. The two formats differ in how many operands they use and how the result is written back to registers.

FMA3 refers to the three-operand fused multiply-add form. In FMA3, the operation computes a*b + c in

FMA4 is a four-operand variant that explicitly separates the destination from the input operands. Its canonical

In practice, most current software targets FMA3 capabilities, using compiler intrinsics or auto-vectorization to emit FMA3

one
instruction,
with
the
result
written
to
a
destination
register
that
is
typically
one
of
the
three
operands.
This
design
allows
compact
encoding
and
is
widely
supported
by
modern
Intel
and
AMD
processors.
FMA3
reduces
rounding
errors
relative
to
performing
multiply
and
add
separately
and
can
improve
throughput
in
floating-point
heavy
code.
form
is
d
=
a*b
+
c,
where
d
is
distinct
from
a,
b,
and
c.
FMA4
originated
with
AMD’s
Bulldozer-era
architectures
and
provides
greater
flexibility
in
register
usage,
which
can
reduce
the
need
to
overwrite
input
registers.
However,
FMA4
has
seen
limited
adoption
beyond
certain
AMD
generations
and
is
not
as
broadly
supported
as
FMA3
on
many
modern
CPUs.
instructions.
FMA4
support
is
less
common
and
can
complicate
portability
across
processors.
Understanding
the
specific
microarchitecture
and
compiler
support
is
important
when
optimizing
numerical
code
for
fused
multiply-add
operations.