Contiguous load non-temporal of words to multiple consecutive vectors (immediate index)
This instruction performs a contiguous non-temporal load of words to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index that is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
It has encodings from 2 classes: Two registers and Four registers
| 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | imm4 | 0 | 1 | 0 | PNg | Rn | Zt | 1 | ||||||||||||
| msz | N | ||||||||||||||||||||||||||||||
if !IsFeatureImplemented(FEAT_SME2) && !IsFeatureImplemented(FEAT_SVE2p1) then EndOfDecode(Decode_UNDEF); end; let n : integer = UInt(Rn); let g : integer = UInt('1'::PNg); let nreg : integer{} = 2; let t : integer = UInt(Zt::'0'); let esize : integer{} = 32; let offset : integer = SInt(imm4);
| 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | imm4 | 1 | 1 | 0 | PNg | Rn | Zt | 0 | 1 | |||||||||||
| msz | N | ||||||||||||||||||||||||||||||
if !IsFeatureImplemented(FEAT_SME2) && !IsFeatureImplemented(FEAT_SVE2p1) then EndOfDecode(Decode_UNDEF); end; let n : integer = UInt(Rn); let g : integer = UInt('1'::PNg); let nreg : integer{} = 4; let t : integer = UInt(Zt::'00'); let esize : integer{} = 32; let offset : integer = SInt(imm4);
| <Zt2> |
Is the name of the second scalable vector register to be transferred, encoded as "Zt" times 2 plus 1. |
| <PNg> |
Is the name of the governing scalable predicate register PN8-PN15, with predicate-as-counter encoding, encoded in the "PNg" field. |
| <Xn|SP> |
Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field. |
| <Zt4> |
Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" times 4 plus 3. |
if IsFeatureImplemented(FEAT_SVE2p1) then CheckSVEEnabled(); else CheckStreamingSVEEnabled(); end; let VL : integer{} = CurrentVL(); let PL : integer{} = VL DIV 8; let elements : integer = VL DIV esize; let mbytes : integer{} = esize DIV 8; var base : bits(64); var addr : bits(64); let pred : bits(PL) = P{}(g); let mask : bits(PL * nreg) = CounterToPredicate{}(pred[15:0]); var values : array [[4]] of bits(VL); let contiguous : boolean = TRUE; let nontemporal : boolean = TRUE; let transfer : integer = t; let tagchecked : boolean = n != 31; let accdesc : AccessDescriptor = CreateAccDescSVE(MemOp_LOAD, nontemporal, contiguous, tagchecked); if !AnyActiveElement{PL*nreg}(mask, esize) then if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then CheckSPAlignment(); end; else if n == 31 then CheckSPAlignment(); end; end; base = if n == 31 then SP{64}() else X{64}(n); addr = AddressAdd(base, offset * nreg * elements * mbytes, accdesc); for r = 0 to nreg-1 do for e = 0 to elements-1 do if ActivePredicateElement{PL*nreg}(mask, r * elements + e, esize) then values[[r]][e*:esize] = Mem{esize}(addr, accdesc); else values[[r]][e*:esize] = Zeros{esize}; end; addr = AddressIncrement(addr, mbytes, accdesc); end; end; for r = 0 to nreg-1 do Z{VL}(transfer+r) = values[[r]]; end;
This instruction is a data-independent-time instruction as described in About PSTATE.DIT.
2026-03_rel 2026-03-26 20:48:11
Copyright © 2010-2026 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.