UTMOPA (2-way) -- A64

Unsigned 16-bit integer sparse sum of outer products to 32-bit integer, accumulating

This instruction generates unsigned integer sum of outer products by multiplying the 2-in-4 selected elements from the dense sub-matrices in the two first source vectors with the corresponding elements of the compressed sparse sub-matrix in the second source vector and accumulates the results to the corresponding elements of a 32-bit element ZA tile.

The sum of outer products is generated by multiplying the selected 2-in-4 16-bit unsigned values from each overlapping 32-bit containers of the two SVL_S×2 sub-matrices in the first source vectors by the two 16-bit unsigned values from the corresponding 32-bit container of the 2×SVL_S sub-matrix in the second source vector. The two selected elements from each overlapping 32-bit containers of the first source vectors correspond to 2-in-4 elements of rows of two SVL_S×2 sub-matrices. Each 32-bit container of the second source vector holds 2 elements of columns of a compressed 2×SVL_S sub-matrix.

The 2-in-4 16-bit unsigned values from overlapping 32-bit containers of the first source vectors are selected by 4-bit controls in the indexed segment of the control vector register. If the control bit corresponding to an element in the first source vectors is 0, the element is discarded and does not contribute to the sum of products result. If more than two bits of the 4-bit control corresponding to 32-bit containers of the first source vectors are 1, only the elements corresponding to the least two significant bits are selected.

The resulting SVL_S×SVL_S widened 32-bit integer sum of outer products is then destructively added to the 32-bit integer destination tile. This is equivalent to performing a 2-way dot product and accumulate to each of the destination tile elements.

SME2
(FEAT_SME_TMOP)

Decode for this encoding

if !IsFeatureImplemented(FEAT_SME_TMOP) then EndOfDecode(Decode_UNDEF); end; let n : integer = UInt(Zn::'0'); let m : integer = UInt(Zm); let k : integer = UInt('1'::K::'1'::Zk); let index : integer = UInt(i2); let da : integer = UInt(ZAda); let unsigned : boolean = TRUE;

Assembler Symbols

<ZAda>

Is the name of the ZA tile ZA0-ZA3, encoded in the "ZAda" field.

<Zn1>

Is the name of the first scalable vector register of the first source multi-vector group, encoded as "Zn" times 2.

<Zn2>

Is the name of the second scalable vector register of the first source multi-vector group, encoded as "Zn" times 2 plus 1.

<Zm>	Is the name of the second source scalable vector register, encoded in the "Zm" field.

<Zk>	Is the name of the control vector register Z20-Z23 or Z28-Z31, encoded in the "K:Zk" fields.

<index>

Is the control segment index, in the range 0 to 3, encoded in the "i2" field.

Operation

CheckStreamingSVEAndZAEnabled(); let VL : integer{} = CurrentVL(); let dim : integer{} = VL DIV 32; let csize : integer{} = VL DIV 8; let op2 : bits(VL) = Z{}(m); let op3 : bits(VL) = Z{}(k); let ctrl : bits(csize) = op3[index*:csize]; let op4 : bits(dim*dim*32) = ZAtile{}(da, 32); var result : bits(dim*dim*32); for row = 0 to dim-1 do for col = 0 to dim-1 do var erow : array [[2]] of bits(16); var ecol : array [[2]] of bits(16); for j = 0 to 1 do erow[[j]] = Zeros{16}; ecol[[j]] = op2[(2*col + j)*:16]; end; var i : integer = 0; for r = 0 to 1 do let op1 : bits(VL) = Z{}(n+r); for e = 0 to 1 do if i < 2 && ctrl[(4*col + 2*r + e)*:1] == '1' then erow[[i]] = op1[(2*row + e)*:16]; i = i + 1; end; end; end; var sum : bits(32) = op4[(row*dim+col)*:32]; for j = 0 to 1 do let erowval : integer = if unsigned then UInt(erow[[j]]) else SInt(erow[[j]]); let ecolval : integer = if unsigned then UInt(ecol[[j]]) else SInt(ecol[[j]]); sum = sum + (erowval * ecolval); end; result[(row*dim+col)*:32] = sum; end; end; ZAtile{dim*dim*32}(da, 32) = result;

Operational information

This instruction is a data-independent-time instruction as described in About PSTATE.DIT.

2026-03_rel 2026-03-26 20:48:11

31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
1	0	0	0	0	0	0	1	0	1	0	Zm					1	0	0	K	Zk		Zn				i2		1	0	ZAda
							u0

UTMOPA (2-way)

SME2(FEAT_SME_TMOP)

Encoding

Decode for this encoding

Assembler Symbols

Operation

Operational information

SME2
(FEAT_SME_TMOP)