CPYFP, CPYFM, CPYFE

Memory copy forward-only

These instructions copy a requested number of bytes in memory from a source address to a destination address in a forward direction. The prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYFP, then CPYFM, and then CPYFE.

CPYFP performs some preconditioning of the arguments suitable for using the CPYFM instruction, and copies an IMPLEMENTATION DEFINED portion of the requested number of bytes. CPYFM copies a further IMPLEMENTATION DEFINED portion of the remaining bytes. CPYFE copies any final remaining bytes.


Note

The ability to copy an IMPLEMENTATION DEFINED number of bytes allows an implementation to optimize how the bytes being copied are divided between the different instructions.


For more information on exceptions specific to memory copy instructions, see Memory Copy and Memory Set exceptions.

The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a memory copy only where there is no overlap between the source and destination locations, or where the source address is greater than or equal to the destination address.

The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is IMPLEMENTATION DEFINED.


Note

Portable software should not assume that the choice of algorithm is constant.


For CPYFP:

On completion of CPYFP, option A:

On completion of CPYFP, option B:

For CPYFM, option A, when PSTATE.C = '0':

For CPYFM, option B, when PSTATE.C = '1':

For CPYFE, option A, when PSTATE.C = '0':

For CPYFE, option B, when PSTATE.C = '1':

Integer
(FEAT_MOPS)

313029282726252423222120191817161514131211109876543210
sz011001op10Rs000001RnRd
o0op2

Encoding for the Prologue variant

Applies when (op1 == 00)

CPYFP [<Xd>]!, [<Xs>]!, <Xn>!

Encoding for the Main variant

Applies when (op1 == 01)

CPYFM [<Xd>]!, [<Xs>]!, <Xn>!

Encoding for the Epilogue variant

Applies when (op1 == 10)

CPYFE [<Xd>]!, [<Xs>]!, <Xn>!

Decode for all variants of this encoding

if !IsFeatureImplemented(FEAT_MOPS) || sz != '00' then EndOfDecode(Decode_UNDEF); end; var memcpy : CPYParams; memcpy.d = UInt(Rd); memcpy.s = UInt(Rs); memcpy.n = UInt(Rn); let options : bits(4) = op2; let rnontemporal : boolean = options[3] == '1'; let wnontemporal : boolean = options[2] == '1'; case op1 of when '00' => memcpy.stage = MOPSStage_Prologue; when '01' => memcpy.stage = MOPSStage_Main; when '10' => memcpy.stage = MOPSStage_Epilogue; end;

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on UNPREDICTABLE behaviors, and particularly Memory Copy and Memory Set CPY* and Crossing a page boundary with different memory types or Shareability attributes.

Assembler Symbols

<Xd>

For the "Prologue" variant: is the 64-bit name of the general-purpose register that holds the destination address and is updated by the instruction, encoded in the "Rd" field.

For the "Epilogue" and "Main" variants: is the 64-bit name of the general-purpose register that holds an encoding of the destination address, encoded in the "Rd" field.

<Xs>

For the "Prologue" variant: is the 64-bit name of the general-purpose register that holds the source address and is updated by the instruction, encoded in the "Rs" field.

For the "Epilogue" and "Main" variants: is the 64-bit name of the general-purpose register that holds an encoding of the source address, encoded in the "Rs" field.

<Xn>

For the "Prologue" variant: is the 64-bit name of the general-purpose register that holds the number of bytes to be transferred and is updated by the instruction to encode the remaining size and destination, encoded in the "Rn" field.

For the "Main" variant: is the 64-bit name of the general-purpose register that holds an encoding of the number of bytes to be transferred, encoded in the "Rn" field.

For the "Epilogue" variant: is the 64-bit name of the general-purpose register that holds an encoding of the number of bytes to be transferred and is set to zero on completion of the instruction, encoded in the "Rn" field.

Operation

CheckMOPSEnabled(); CheckCPYConstrainedUnpredictable(memcpy.n, memcpy.d, memcpy.s); memcpy.nzcv = PSTATE.[N,Z,C,V]; memcpy.toaddress = X{64}(memcpy.d); memcpy.fromaddress = X{64}(memcpy.s); memcpy.cpysize = SInt(X{64}(memcpy.n)); memcpy.implements_option_a = CPYFOptionA(); let rprivileged : boolean = (if options[1] == '1' then AArch64_IsUnprivAccessPriv() else PSTATE.EL != EL0); let wprivileged : boolean = (if options[0] == '1' then AArch64_IsUnprivAccessPriv() else PSTATE.EL != EL0); let raccdesc : AccessDescriptor = CreateAccDescMOPS(MemOp_LOAD, rprivileged, rnontemporal); let waccdesc : AccessDescriptor = CreateAccDescMOPS(MemOp_STORE, wprivileged, wnontemporal); if memcpy.stage == MOPSStage_Prologue then if memcpy.cpysize[63] == '1' then memcpy.cpysize = ArchMaxMOPSBlockSize; end; if memcpy.implements_option_a then memcpy.nzcv = '0000'; // Copy in the forward direction offsets the arguments. memcpy.toaddress = memcpy.toaddress + memcpy.cpysize; memcpy.fromaddress = memcpy.fromaddress + memcpy.cpysize; memcpy.cpysize = 0 - memcpy.cpysize; else memcpy.nzcv = '0010'; end; end; memcpy.stagecpysize = MemCpyStageSize(memcpy); if memcpy.stage != MOPSStage_Prologue then CheckMemCpyParams(memcpy, options); end; var copied : integer; var iswrite : boolean; var memaddrdesc : AddressDescriptor; var memstatus : PhysMemRetStatus; memcpy.forward = TRUE; var fault : boolean = FALSE; var B : MOPSBlockSize = 0; if memcpy.implements_option_a then while memcpy.stagecpysize != 0 && !fault looplimit ArchMaxMOPSBlockSize do // IMP DEF selection of the block size that is worked on. While many // implementations might make this constant, that is not assumed. B = CPYSizeChoice(memcpy); assert B <= -1 * memcpy.stagecpysize; (copied, iswrite, memaddrdesc, memstatus) = MemCpyBytes(memcpy.toaddress + memcpy.cpysize, memcpy.fromaddress + memcpy.cpysize, memcpy.forward, B, raccdesc, waccdesc); if copied != B then fault = TRUE; else memcpy.cpysize = memcpy.cpysize + B; memcpy.stagecpysize = memcpy.stagecpysize + B; end; end; else while memcpy.stagecpysize > 0 && !fault looplimit ArchMaxMOPSBlockSize do // IMP DEF selection of the block size that is worked on. While many // implementations might make this constant, that is not assumed. B = CPYSizeChoice(memcpy); assert B <= memcpy.stagecpysize; (copied, iswrite, memaddrdesc, memstatus) = MemCpyBytes(memcpy.toaddress, memcpy.fromaddress, memcpy.forward, B, raccdesc, waccdesc); if copied != B then fault = TRUE; else memcpy.fromaddress = memcpy.fromaddress + B; memcpy.toaddress = memcpy.toaddress + B; memcpy.cpysize = memcpy.cpysize - B; memcpy.stagecpysize = memcpy.stagecpysize - B; end; end; end; UpdateCpyRegisters(memcpy, fault, copied); if fault then if IsFault(memaddrdesc) then AArch64_Abort(memaddrdesc.fault); end; if IsFault(memstatus) then let accdesc : AccessDescriptor = if iswrite then waccdesc else raccdesc; HandleExternalAbort(memstatus, iswrite, memaddrdesc, B, accdesc); end; end; if memcpy.stage == MOPSStage_Prologue then PSTATE.[N,Z,C,V] = memcpy.nzcv; end;


2026-03_rel 2026-03-26 20:48:11

Copyright © 2010-2026 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.