BFloat16 fused multiply-subtract
This instruction multiplies the active BFloat16 elements of the first source vector by the corresponding BFloat16 elements of the second source vector. The results are then subtracted from elements of the third source (addend) vector without intermediate rounding and destructively placed in the destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.
This instruction follows SVE2 non-widening BFloat16 numerical behaviors.
ID_AA64ZFR0_EL1.B16B16 indicates whether this instruction is implemented.
| 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | Zm | 0 | 0 | 1 | Pg | Zn | Zda | ||||||||||||||
| size | op | ||||||||||||||||||||||||||||||
if !IsFeatureImplemented(FEAT_SVE_B16B16) then EndOfDecode(Decode_UNDEF); end; let g : integer = UInt(Pg); let n : integer = UInt(Zn); let m : integer = UInt(Zm); let da : integer = UInt(Zda); let op1_neg : boolean = TRUE; let op3_neg : boolean = FALSE;
| <Zda> |
Is the name of the third source and destination scalable vector register, encoded in the "Zda" field. |
| <Pg> |
Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field. |
| <Zn> |
Is the name of the first source scalable vector register, encoded in the "Zn" field. |
| <Zm> |
Is the name of the second source scalable vector register, encoded in the "Zm" field. |
if IsFeatureImplemented(FEAT_SME2) then CheckSVEEnabled(); else CheckNonStreamingSVEEnabled(); end; let VL : integer{} = CurrentVL(); let PL : integer{} = VL DIV 8; let elements : integer = VL DIV 16; let mask : bits(PL) = P{}(g); let op1 : bits(VL) = if AnyActiveElement{PL}(mask, 16) then Z{VL}(n) else Zeros{VL}; let op2 : bits(VL) = if AnyActiveElement{PL}(mask, 16) then Z{VL}(m) else Zeros{VL}; let op3 : bits(VL) = Z{}(da); var result : bits(VL); for e = 0 to elements-1 do if ActivePredicateElement{PL}(mask, e, 16) then let elem1 : bits(16) = if op1_neg then BFNeg(op1[e*:16]) else op1[e*:16]; let elem2 : bits(16) = op2[e*:16]; let elem3 : bits(16) = if op3_neg then BFNeg(op3[e*:16]) else op3[e*:16]; result[e*:16] = BFMulAdd(elem3, elem1, elem2, FPCR()); else result[e*:16] = op3[e*:16]; end; end; Z{VL}(da) = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is CONSTRAINED UNPREDICTABLE:
2026-03_rel 2026-03-26 20:48:11
Copyright © 2010-2026 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.