STNT1B (scalar plus immediate, strided registers)

Contiguous store non-temporal of bytes from multiple strided vectors (immediate index)

This instruction performs a contiguous non-temporal store of bytes from elements of two or four strided vector registers to the memory address generated by a 64-bit scalar base and immediate index that is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.

Inactive elements are not written to memory.

A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.

It has encodings from 2 classes: Two registers and Four registers

Two registers
(FEAT_SME2)

313029282726252423222120191817161514131211109876543210
101000010110imm4000PNgRnT1Zt
mszN

Encoding

STNT1B { <Zt1>.B, <Zt2>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]

Decode for this encoding

if !IsFeatureImplemented(FEAT_SME2) then EndOfDecode(Decode_UNDEF); end; let n : integer = UInt(Rn); let g : integer = UInt('1'::PNg); let nreg : integer{} = 2; let tstride : integer = 8; let t : integer = UInt(T::'0'::Zt); let esize : integer{} = 8; let offset : integer = SInt(imm4);

Four registers
(FEAT_SME2)

313029282726252423222120191817161514131211109876543210
101000010110imm4100PNgRnT10Zt
mszN

Encoding

STNT1B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]

Decode for this encoding

if !IsFeatureImplemented(FEAT_SME2) then EndOfDecode(Decode_UNDEF); end; let n : integer = UInt(Rn); let g : integer = UInt('1'::PNg); let nreg : integer{} = 4; let tstride : integer = 4; let t : integer = UInt(T::'00'::Zt); let esize : integer{} = 8; let offset : integer = SInt(imm4);

Assembler Symbols

<Zt1>

For the "Two registers" variant: is the name of the first scalable vector register Z0-Z7 or Z16-Z23 to be transferred, encoded as "T:'0':Zt".

For the "Four registers" variant: is the name of the first scalable vector register Z0-Z3 or Z16-Z19 to be transferred, encoded as "T:'00':Zt".

<Zt2>

For the "Two registers" variant: is the name of the second scalable vector register Z8-Z15 or Z24-Z31 to be transferred, encoded as "T:'1':Zt".

For the "Four registers" variant: is the name of the second scalable vector register Z4-Z7 or Z20-Z23 to be transferred, encoded as "T:'01':Zt".

<PNg>

Is the name of the governing scalable predicate register PN8-PN15, with predicate-as-counter encoding, encoded in the "PNg" field.

<Xn|SP>

Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

<imm>

For the "Two registers" variant: is the optional signed immediate vector offset, a multiple of 2 in the range -16 to 14, defaulting to 0, encoded in the "imm4" field.

For the "Four registers" variant: is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0, encoded in the "imm4" field.

<Zt3>

Is the name of the third scalable vector register Z8-Z11 or Z24-Z27 to be transferred, encoded as "T:'10':Zt".

<Zt4>

Is the name of the fourth scalable vector register Z12-Z15 or Z28-Z31 to be transferred, encoded as "T:'11':Zt".

Operation

CheckStreamingSVEEnabled(); let VL : integer{} = CurrentVL(); let PL : integer{} = VL DIV 8; let elements : integer = VL DIV esize; let mbytes : integer{} = esize DIV 8; var base : bits(64); var addr : bits(64); var src : bits(VL); let pred : bits(PL) = P{}(g); let mask : bits(PL * nreg) = CounterToPredicate{}(pred[15:0]); let contiguous : boolean = TRUE; let nontemporal : boolean = TRUE; var transfer : integer = t; let tagchecked : boolean = n != 31; let accdesc : AccessDescriptor = CreateAccDescSVE(MemOp_STORE, nontemporal, contiguous, tagchecked); if !AnyActiveElement{PL*nreg}(mask, esize) then if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then CheckSPAlignment(); end; else if n == 31 then CheckSPAlignment(); end; end; base = if n == 31 then SP{64}() else X{64}(n); addr = AddressAdd(base, offset * nreg * elements * mbytes, accdesc); for r = 0 to nreg-1 do src = Z{VL}(transfer); for e = 0 to elements-1 do if ActivePredicateElement{PL*nreg}(mask, r * elements + e, esize) then Mem{esize}(addr, accdesc) = src[e*:esize]; end; addr = AddressIncrement(addr, mbytes, accdesc); end; transfer = transfer + tstride; end;

Operational information

This instruction is a data-independent-time instruction as described in About PSTATE.DIT.


2026-03_rel 2026-03-26 20:48:11

Copyright © 2010-2026 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.