LD1B (scalar plus scalar, tile slice)

Contiguous load of bytes to 8-bit element ZA tile slice

This instruction performs a contiguous load of bytes to an 8-bit element ZA tile slice. The slice number within the tile is selected by the sum of the slice index register and immediate offset, modulo the number of 8-bit elements in a vector. The immediate offset is in the range 0 to 15. The memory address is generated by a 64-bit scalar base and an optional 64-bit scalar offset that is added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.

SME
(FEAT_SME)

313029282726252423222120191817161514131211109876543210
11100000000RmVRsPgRn0off4
msz

Encoding

LD1B { ZA0<HV>.B[<Ws>, <offs>] }, <Pg>/Z, [<Xn|SP>{, <Xm>}]

Decode for this encoding

if !IsFeatureImplemented(FEAT_SME) then EndOfDecode(Decode_UNDEF); end; let n : integer = UInt(Rn); let m : integer = UInt(Rm); let g : integer = UInt('0'::Pg); let s : integer = UInt('011'::Rs); let t : integer = 0; let offset : integer = UInt(off4); let esize : integer{} = 8; let vertical : boolean = V == '1';

Assembler Symbols

<HV>

Is the horizontal or vertical slice indicator, encoded in V:

V <HV>
0 H
1 V
<Ws>

Is the 32-bit name of the slice index register W12-W15, encoded in the "Rs" field.

<offs>

Is the slice index offset, in the range 0 to 15, encoded in the "off4" field.

<Pg>

Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

<Xn|SP>

Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

<Xm>

Is the optional 64-bit name of the general-purpose offset register, defaulting to XZR, encoded in the "Rm" field.

Operation

CheckStreamingSVEAndZAEnabled(); let VL : integer{} = CurrentVL(); let PL : integer{} = VL DIV 8; let dim : integer = VL DIV esize; var base : bits(64); var addr : bits(64); let mask : bits(PL) = P{}(g); var moffs : bits(64) = X{}(m); let index : bits(32) = X{}(s); let slice : integer = (UInt(index) + offset) MOD dim; var result : bits(VL); let mbytes : integer{} = esize DIV 8; let contiguous : boolean = TRUE; let nontemporal : boolean = FALSE; let tagchecked : boolean = TRUE; let accdesc : AccessDescriptor = CreateAccDescSME(MemOp_LOAD, nontemporal, contiguous, tagchecked); if n == 31 then if (AnyActiveElement{PL}(mask, esize) || ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE)) then CheckSPAlignment(); end; base = SP{64}(); else base = X{64}(n); end; for e = 0 to dim - 1 do addr = AddressAdd(base, UInt(moffs) * mbytes, accdesc); if ActivePredicateElement{PL}(mask, e, esize) then result[e*:esize] = Mem{esize}(addr, accdesc); else result[e*:esize] = Zeros{esize}; end; moffs = moffs + 1; end; ZAslice{VL}(t, esize, vertical, slice) = result;

Operational information

This instruction is a data-independent-time instruction as described in About PSTATE.DIT.


2026-03_rel 2026-03-26 20:48:11

Copyright © 2010-2026 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.