FMMLA

Floating-point matrix multiply-accumulate

The floating-point matrix multiply-accumulate instruction supports single-precision and double-precision data types in a 2×2 matrix contained in segments of 128 or 256 bits, respectively. It multiplies the 2×2 matrix in each segment of the first source vector by the 2×2 matrix in the corresponding segment of the second source vector. The resulting 2×2 matrix product is then destructively added to the matrix accumulator held in the corresponding segment of the addend and destination vector. This is equivalent to performing a 2-way dot product per destination element. This instruction is unpredicated. The single-precision variant is vector length agnostic. The double-precision variant requires that the Effective SVE vector length is at least 256 bits.

ID_AA64ZFR0_EL1.F32MM indicates whether the single-precision variant is implemented.

ID_AA64ZFR0_EL1.F64MM indicates whether the double-precision variant is implemented.

This instruction is illegal when executed in Streaming SVE mode, unless FEAT_SME_FA64 is implemented and enabled.

It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element
(FEAT_F32MM)

313029282726252423222120191817161514131211109876543210
01100100101Zm111001ZnZda

FMMLA <Zda>.S, <Zn>.S, <Zm>.S

if !HaveSVE() || !HaveSVEFP32MatMulExt() then UNDEFINED; constant integer esize = 32; integer n = UInt(Zn); integer m = UInt(Zm); integer da = UInt(Zda);

64-bit element
(FEAT_F64MM)

313029282726252423222120191817161514131211109876543210
01100100111Zm111001ZnZda

FMMLA <Zda>.D, <Zn>.D, <Zm>.D

if !HaveSVE() || !HaveSVEFP64MatMulExt() then UNDEFINED; constant integer esize = 64; integer n = UInt(Zn); integer m = UInt(Zm); integer da = UInt(Zda);

Assembler Symbols

<Zda>

Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.

<Zn>

Is the name of the first source scalable vector register, encoded in the "Zn" field.

<Zm>

Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckNonStreamingSVEEnabled(); constant integer VL = CurrentVL; constant integer PL = VL DIV 8; if VL < esize * 4 then UNDEFINED; constant integer segments = VL DIV (4 * esize); bits(VL) operand1 = Z[n, VL]; bits(VL) operand2 = Z[m, VL]; bits(VL) operand3 = Z[da, VL]; bits(VL) result = Zeros(VL); bits(4*esize) op1, op2; bits(4*esize) res, addend; for s = 0 to segments-1 op1 = Elem[operand1, s, 4*esize]; op2 = Elem[operand2, s, 4*esize]; addend = Elem[operand3, s, 4*esize]; res = FPMatMulAdd(addend, op1, op2, esize, FPCR); Elem[result, s, 4*esize] = res; Z[da, VL] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is unpredictable:


Internal version only: aarchmrs v2023-12_rel, pseudocode v2023-12_rel, sve v2023-12_rel ; Build timestamp: 2023-12-15T16:46

Copyright © 2010-2023 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.