BFloat16 dot product to single-precision (vector, by element)
This instruction delimits the source vectors into pairs of BFloat16 elements. The BFloat16 pair within the second source vector is specified using an immediate index. The index range is from 0 to 3 inclusive.
If FEAT_EBF16 is not implemented or FPCR.EBF is 0, this instruction:
If FEAT_EBF16 is implemented and FPCR.EBF is 1, then this instruction:
Irrespective of FEAT_EBF16 and FPCR.EBF, this instruction:
ID_AA64ISAR1_EL1.BF16 indicates whether this instruction is supported.
| 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| 0 | Q | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | L | M | Rm | 1 | 1 | 1 | 1 | H | 0 | Rn | Rd | |||||||||||
| U | size | opcode | |||||||||||||||||||||||||||||
if !IsFeatureImplemented(FEAT_BF16) then EndOfDecode(Decode_UNDEF); end; let n : integer = UInt(Rn); let m : integer = UInt(M::Rm); let d : integer = UInt(Rd); let i : integer = UInt(H::L); let datasize : integer{} = 64 << UInt(Q); let elements : integer = datasize DIV 32;
| <Vd> |
Is the name of the SIMD&FP destination register, encoded in the "Rd" field. |
| <Ta> |
Is an arrangement specifier,
encoded in
|
| <Vn> |
Is the name of the first SIMD&FP source register, encoded in the "Rn" field. |
| <Tb> |
Is an arrangement specifier,
encoded in
|
| <Vm> |
Is the name of the second SIMD&FP source register, encoded in the "M:Rm" fields. |
| <index> |
Is the immediate index of a pair of 16-bit elements in the range 0 to 3, encoded in the "H:L" fields. |
AArch64_CheckFPAdvSIMDEnabled(); let operand1 : bits(datasize) = V{}(n); let operand2 : bits(128) = V{}(m); let operand3 : bits(datasize) = V{}(d); var result : bits(datasize); for e = 0 to elements-1 do let elt1_a : bits(16) = operand1[(2 * e + 0)*:16]; let elt1_b : bits(16) = operand1[(2 * e + 1)*:16]; let elt2_a : bits(16) = operand2[(2 * i + 0)*:16]; let elt2_b : bits(16) = operand2[(2 * i + 1)*:16]; let sum : bits(32) = operand3[e*:32]; result[e*:32] = BFDotAdd(sum, elt1_a, elt1_b, elt2_a, elt2_b, FPCR()); end; V{datasize}(d) = result;
2026-03_rel 2026-03-26 20:48:11
Copyright © 2010-2026 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.