CPYFPWTN, CPYFMWTN, CPYFEWTN

Memory Copy Forward-only, writes unprivileged, reads and writes non-temporal. These instructions perform a memory copy. The prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYFPWTN, then CPYFMWTN, and then CPYFEWTN.

CPYFPWTN performs some preconditioning of the arguments suitable for using the CPYFMWTN instruction, and performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMWTN performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFEWTN performs the last part of the memory copy.


Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be performed.


The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a memory copy only where there is no overlap between the source and destination locations, or where the source address is greater than the destination address.

The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is IMPLEMENTATION DEFINED.


Note

Portable software should not assume that the choice of algorithm is constant.


After execution of CPYFPWTN, option A (which results in encoding PSTATE.C = 0):

After execution of CPYFPWTN, option B (which results in encoding PSTATE.C = 1):

For CPYFMWTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:

For CPYFMWTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:

For CPYFEWTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:

For CPYFEWTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:

Integer
(FEAT_MOPS)

313029282726252423222120191817161514131211109876543210
sz011001op10Rs110101RnRd
o0op2

Prologue (op1 == 00)

CPYFPWTN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYFMWTN [<Xd>]!, [<Xs>]!, <Xn>!

Epilogue (op1 == 10)

CPYFEWTN [<Xd>]!, [<Xs>]!, <Xn>!

if !IsFeatureImplemented(FEAT_MOPS) || sz != '00' then UNDEFINED; CPYParams memcpy; memcpy.d = UInt(Rd); memcpy.s = UInt(Rs); memcpy.n = UInt(Rn); bits(4) options = op2; boolean rnontemporal = options<3> == '1'; boolean wnontemporal = options<2> == '1'; case op1 of when '00' memcpy.stage = MOPSStage_Prologue; when '01' memcpy.stage = MOPSStage_Main; when '10' memcpy.stage = MOPSStage_Epilogue; otherwise SEE "Memory Copy and Memory Set";

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on UNPREDICTABLE behaviors, and particularly Memory Copy and Memory Set CPY*.

Assembler Symbols

<Xd>

For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination address and is updated by the instruction, encoded in the "Rd" field.

For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an encoding of the destination address, encoded in the "Rd" field.

<Xs>

For the prologue variant: is the 64-bit name of the general-purpose register that holds the source address and is updated by the instruction, encoded in the "Rs" field.

For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an encoding of the source address, encoded in the "Rs" field.

<Xn>

For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of bytes to be transferred and is updated by the instruction to encode the remaining size and destination, encoded in the "Rn" field.

For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the number of bytes to be transferred, encoded in the "Rn" field.

For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the "Rn" field.

Operation

CheckMOPSEnabled(); CheckCPYConstrainedUnpredictable(memcpy.n, memcpy.d, memcpy.s); constant integer N = MaxBlockSizeCopiedBytes(); bits(8*N) readdata; memcpy.nzcv = PSTATE.<N,Z,C,V>; memcpy.toaddress = X[memcpy.d, 64]; memcpy.fromaddress = X[memcpy.s, 64]; memcpy.cpysize = SInt(X[memcpy.n, 64]); memcpy.implements_option_a = CPYFOptionA(); boolean rprivileged = if options<1> == '1' then AArch64.IsUnprivAccessPriv() else PSTATE.EL != EL0; boolean wprivileged = if options<0> == '1' then AArch64.IsUnprivAccessPriv() else PSTATE.EL != EL0; AccessDescriptor raccdesc = CreateAccDescMOPS(MemOp_LOAD, rprivileged, rnontemporal); AccessDescriptor waccdesc = CreateAccDescMOPS(MemOp_STORE, wprivileged, wnontemporal); if memcpy.stage == MOPSStage_Prologue then if memcpy.cpysize<63> == '1' then memcpy.cpysize = 0x7FFFFFFFFFFFFFFF; if memcpy.implements_option_a then memcpy.nzcv = '0000'; // Copy in the forward direction offsets the arguments. memcpy.toaddress = memcpy.toaddress + memcpy.cpysize; memcpy.fromaddress = memcpy.fromaddress + memcpy.cpysize; memcpy.cpysize = 0 - memcpy.cpysize; else memcpy.nzcv = '0010'; memcpy.stagecpysize = MemCpyStageSize(memcpy); if memcpy.stage != MOPSStage_Prologue then CheckMemCpyParams(memcpy, options); integer copied; boolean iswrite; AddressDescriptor memaddrdesc; PhysMemRetStatus memstatus; memcpy.forward = TRUE; boolean fault = FALSE; integer B; if memcpy.implements_option_a then while memcpy.stagecpysize != 0 && !fault do // IMP DEF selection of the block size that is worked on. While many // implementations might make this constant, that is not assumed. B = CPYSizeChoice(memcpy); assert B <= -1 * memcpy.stagecpysize; (copied, iswrite, memaddrdesc, memstatus) = MemCpyBytes(memcpy.toaddress + memcpy.cpysize, memcpy.fromaddress + memcpy.cpysize, memcpy.forward, B, raccdesc, waccdesc); if copied != B then fault = TRUE; else memcpy.cpysize = memcpy.cpysize + B; memcpy.stagecpysize = memcpy.stagecpysize + B; else while memcpy.stagecpysize > 0 && !fault do // IMP DEF selection of the block size that is worked on. While many // implementations might make this constant, that is not assumed. B = CPYSizeChoice(memcpy); assert B <= memcpy.stagecpysize; (copied, iswrite, memaddrdesc, memstatus) = MemCpyBytes(memcpy.toaddress, memcpy.fromaddress, memcpy.forward, B, raccdesc, waccdesc); if copied != B then fault = TRUE; else memcpy.fromaddress = memcpy.fromaddress + B; memcpy.toaddress = memcpy.toaddress + B; memcpy.cpysize = memcpy.cpysize - B; memcpy.stagecpysize = memcpy.stagecpysize - B; UpdateCpyRegisters(memcpy, fault, copied); if fault then if IsFault(memaddrdesc) then AArch64.Abort(memaddrdesc.vaddress, memaddrdesc.fault); if IsFault(memstatus) then AccessDescriptor accdesc = if iswrite then waccdesc else raccdesc; HandleExternalAbort(memstatus, iswrite, memaddrdesc, B, accdesc); if memcpy.stage == MOPSStage_Prologue then PSTATE.<N,Z,C,V> = memcpy.nzcv;


Internal version only: aarchmrs v2023-12_rel, pseudocode v2023-12_rel, sve v2023-12_rel ; Build timestamp: 2023-12-15T16:46

Copyright © 2010-2023 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.