SECRET OF CSS

Solidity Tutorial: All About Memory | by Jean Cvllr | Sep, 2022


Understanding EVM memory

0*f8G4r1C g2nNPnec
photo by Mech Mind on Unsplash

This is Part II of the “All About Data Locations” sub-series.

We will learn the layout of the EVM memory, its reserved spaces, the free memory pointer, how to use memory references to read and write from/to memory and the conventional best practices when working with memory.

We will use code snippets of the contracts from the Ethereum Name Service (ENS) to support this article with meaningful examples. This will help us understand better how the smart contracts behind this popular project work under the hood.

  • Introduction
  • The EVM Memory — Overview
  • Layout of Memory
  • Basics of Memory
  • Reading from Memory (MLOAD)
  • Writing to Memory (MSTORE + MSTORE8)
  • Knowing the Memory Size (MSIZE)
  • The Free memory pointer
  • memory references as functions parameters
  • memory references inside functions body
  • The Memory Expansion cost
  • Memory between contract calls
  • Conclusion

In the introductory article “All About Data Locations”, I describe the EVM as an industrial factory. In some parts of a factory, you will find machines and robots controlled by operators.

These machines break down large pieces of steel/aluminium that cannot be processed (e.g., fit through the furnace door) into smaller chunks.

We can use the same example for Ethereum. The EVM operates as a stack machine on 32 bytes words. When the EVM encounters data larger than 32 bytes (complex types like string, bytes, struct or arrays), it cannot process them on the stack because these items are too large.

Therefore, the EVM needs to take this data and process it elsewhere. It has a dedicated place for this: the memory. By putting such variables in memory, the EVM can then deliver them to the stack in smaller chunks, one after the other.

The EVM memory is also used for complex operations built-in Solidity, like abi-encoding, abi-decoding or hashing functions via keccak256. For these specific cases, imagine that the memory acts as a scratchpad or a whiteboard for the EVM.

A teacher or a scientist might use a whiteboard to write things on it to solve problems. The same applies to the EVM. The EVM uses the memory as a scratchpad to perform these operations or computations and return the final value.

0*Fu8leU0fGHBgOnGd
source: https://giphy.com/explore/physics-lecture

For abi.decode(...) or keccak256, the memory is the source of inputs. For abi.encode(...), the memory is where the output will be stored.

The EVM memory has 4 main characteristics:

  • cheaper = in terms of gas
  • mutable = can be overwritten and changed
  • relative to transactions = coming from function calls, or constructor (= contract creation)
  • short term = not persistent and erased between external function calls.

The EVM memory is a byte-addressable space. All the bytes in it are initially empty (defined as zero). It is a mutable data area, meaning you can read from and write to it. Like calldata, memory is addressed by byte indexes, but we will see in the section “Interacting with Memory” that you can only read 32-byte words at a time in memory.

The EVM memory is also volatile. The values stored in memory do not persist between external calls.

A freshly cleared and new memory instance is obtained when a contract calls another contract.

The memory is not erased and cleared per say. Each new instance of the EVM memory is specific to an execution context, the current contract execution.

Therefore, you should remember that the EVM memory is specific to both 1) a message call and 2) the execution environment of the contract being called. We will explain this concept later in more details in a separate section.

Memory is linear and can be addressed at byte level.

Think of memory as a very large (even huge!) array of bytes, like byte[]

1*rmMnXOKhRfRCyl9HV8FEaw

When you interact with the EVM memory, you read from or write to (what I call) “memory blocks” that are 32 bytes long.

Reserved spaces

The first 4 x 32 bytes words in memory are reserved spaces for different purposes:

  • 1st 2 x words (offsets 0x00 and 0x20): scratch space for hashing functions
  • offset 0x40 and 0x50: 3rd word: free memory pointer
  • offset 0x60: meant to be zero permanently and used as the initial value for empty dynamic memory arrays
1*khEqdPBqyjZAN L tGgStg

The free memory pointer (located at offset 0x40) is the most crucial part of the EVM memory. It must be handled with care, especially in assembly/Yul. We will cover it in a separate section.

See section “Layout of Memory” in the Solidity docs for more infos.

Maximum memory limit

We saw that the EVM memory is a linear array, addressable through bytes index (called offset). Up to how many bytes can it contain?

How big is this array? How big is the EVM memory?

The answer to this lies in the geth source code (screenshot below). Look at the conversion type used.

1*u2QJT6F3TBo9jbQA3XGYsw
source: instructions.go (geth client source code).

We can see from this screenshot of the geth client that mStart.Uint64() convert the memory offset to an uint64 value. Meaning that the max amount of data you can put in memory is the maximum value of an uint64 number.

If the offset specified is more than that, it will revert.

You can specify memory only inside function, not outside functions at the contract level.

The following data and values are always in memory by default:

  • Function arguments of complex types.
  • Local variables (inside function bodies) of complex types.
  • Values returned from functions, regardless of their type (this is done via the return opcode).
  • Any complex value type returned by a function must specify the keyword memory

By variables/values of complex types, we refer to variables such as struct, arrays, bytes and strings.

These variables defined with the keyword memory will disappear once the function call ends. This is what we meant previously by “not persisting”.

The reason is that memory tells Solidity to create a chunk of space for the variable at runtime, guaranteeing its size and structure for future use in that function during the function execution.

Interacting with memory — Overview

The Solidity docs states that in the EVM memory:

…reads are limited to a width of 256 bits, while writes can be either 8 bits or 256 bits wide.

If we look at the Yellow Paper, we can see that one opcode is defined to read from memory (MLOAD) and two opcodes are defined to write to memory: MSTORE and MSTORE8.

1*i59M3Bd8 LRYjdS Xs8y1w
source: Ethreum Yellow Paper, page 34

You can read from memory using the MLOAD opcode.

Yellow Paper formula

Here is what the yellow paper says for the MLOAD opcode specification.

let’s demystify this very formal formula!

The formula in the yellow paper can be explained as follow:

  • Us[0] = top item on the stack.
  • Us'[0] = resulting item that is put on top of the stack.
  • Um = content in memory starting at a specific offset.

The formula Um[Us[0]...Us[0] + 31]] can be translated in plain english as follow:

  1. take the last top item on the stack Us[0].
  2. use this value as the starting pointer to read in memory Um (= offset)
  3. read from this memory pointer Us[0] the next 31 bytes that follow (Us[0] + 31).

Reading from memory can only be done 32 bytes words at a time. Meaning you can only 32 bytes from memory at time with the mload opcode.

These opcodes can be used in Solidity inline assembly or standalone Yul code.

Example: SHA1 library from ENS contracts

0*YdcfflrTz3bj4mtZ
image source: ens-domains.com

Let’s look at an example in the ENS contracts: the SHA1.sol.

In the code snippet below, the mload opcode is used twice.

  • first to retrieve the free memory pointer. The scratch variable is then used as a pointer in memory where the sha1 hash of data will be computed and written.
  • second to retrieve the data variable’s length (= number of bytes).
1*WbPZ57b6PmVXAaQVzIgPcQ
source: ENS source code on Github, SHA1.sol library

You can write to the memory using one of the two following opcodes:

  • MSTORE → write a word (= 32 bytes) in memory.
  • MSTORE8 → write a single byte in memory

This tweet shows exactly how the EVM instance in the geth client take the arguments off the stack as input for MSTORE.

0*NrHb6YeHDay65hqw
source: faheel on Twitter

in Solidity

In Solidity, whenever you instantiate a variable with the memory keyword and assign a value (a literal bytes/string, or the return value of a function), the EVM under the hood performs an mstore instruction.

Here is an example from DNSRegistar.sol contract from ENS.

1*FLW1IsZLxBi k0e349 vVA
source: ENS source code on Github, DNSRegistar.sol

in assembly

The mstore opcode can be used in inline assembly. It accepts two arguments:

  • the offset in memory to write to.
  • the data to write in memory

See how mstore is used in assembly in the same ENS contract SHA1.sol.

source: ENS source code on Github, SHA1.sol library.

For more details on the MSIZE opcode, see the opcode explanation on evm.codes

At first guess, the EVM opcode MSIZE by its name sounds like it will return how much data is stored in memory. Or in other word, how many bytes are currently written in memory.

The MSIZE opcode is a bit more complex. The C++ source code of the Solidity compiler provides more information to understand it. See below.

0*16HlGJ8ifDnaKFS
source: SemanticInformation.cpp on Solidity Github repository

The MSIZE opcode returns the highest byte offset accessed in memory in the current execution environment. The size will always be a multiple of word (32 bytes).

But in Solidity, what is the difference between “how many bytes are stored in memory” and “the largest index/offset accessed in memory”?

We will illustrate with a practical example using Solidity itself! Look at the code snippet below.

What is happening here?

step 1: freeMemBefore first return the free memory pointer: 0x80 (= 128)

step 2: we then write data in memory (64 bytes). The free memory pointer gets updated. (freeMemAfter) to become 0xc0 (= 192).

NB: in the example above, the free memory pointer gets updated automatically only because we are outside the assembly block. If you write to memory in assembly via mstore or through similar opcodes that write to memory like calldatacopy, the free memory pointer is not updated automatically. You are responsible to do it manually yourself.

Remember the rules mentioned by the Solidity docs: “Inline assembly might have a quite high-level look, but it is extremely low-level”.

At this point, there is technically 192 bytes in total allocated in memory.

  32 bytes 
x 4 (the first 4 reserved spaces in memory)
---------------------
= 128
+ 64 bytes (the variable `data`)
---------------------
= 192 (total)

Pay now attention to line 28. We try to read in memory at offset 0x0c (192)

step 3: when we do msize (line 31), we obtain the number 224 (= 0xe0). What just happened? There is only 192 bytes stored/allocated in memory in total. Where does this 224 come from?

224 = 192 + 32. So the value returned by msize is the total number of bytes stored in memory (192) + 32. We have just triggered and witnessed a memory expansion. The memory always expand 32 bytes word at a time.

There is no better explanation of the msize opcode than the one from evm.codes to summarize.

msize tracks is the highest offset ever accessed in the current execution. A first write or read to a bigger offset will trigger a memory expansion

In their popular article series “Deconstructing a Smart Contract”, OpenZeppelin reveals the meaning of the opcodes behind the first 5 bytes of each smart contract.

0x6080604052...
0*3FlrByGd0DBQm1lX
source: OpenZeppelin, deconstructing a smart contract (Part I)

In a nutschell, this sequence of opcodes stores the number 0x80 (decimal 128) into memory at position 0x40 (decimal 64). What for?

As explained in the previous section “Layout of Memory”, the first 4 words in memory are reserved for specific purposes. The 3rd word — located in memory at position 0x40 — is called the free-memory pointer.

Open Zeppelin describes the free-memory pointer as “a reference to the first unused word in memory”. It enables to know where in memory (at which offset) there is free space to write data to. This is to avoid overriding data already present in memory.

The free memory pointer is one of the most important and key thing of the EVM to be known.

Free memory pointer in Solidity

In Solidity, the free memory pointer is fetched + updated automatically when doing code snippets like bytes memory myVariable.

Let’s look at an example. For the Solidity code:

1*XjJAHD F3DwwvWl5KOBK0A

These are the opcodes generated by the Solidity compiler. What interest us is how the free memory pointer is fetched and updated, from instruction 056 to instruction 065.

1*hE0XT5gcTsowfKQaOvoqmA
a basic sequence of opcode to write a string memory.

When a string or some data is written to memory in Solidity, the EVM always performs the following initial two steps:

Step 1: Get the free memory pointer.

The EVM first loads the free memory pointer from memory position 0x40 . The value returned by mload is 0x80. Our free memory pointer tell us that the first place in memory where there is free space to write is at offset 0x80. This is what is on top of our stack at the end.

1*SEMW7DJTOCXkjQ8yntWpIw

Step 2: Allocate memory + update with the new free memory pointer.

The EVM will now reserve this position in memory for the string test. It keeps the value returned by the free memory pointer locally on the stack.

But the Solidity compiler is smart and safe! After allocating and before writing any value in memory, it always updates the free memory pointer. This is to point to the next free space in memory.

According to the ABI specification, a string is made of two parts: the length + the string itself. Next step then is to update the free memory pointer. What the EVM says here is “I am going to write 2 x 32 bytes words in memory. So the new free memory pointer will be 64 bytes further from the current one”.

What the opcodes do below is simple. It:

  1. duplicate the current value of the free memory pointer = 0x80
  2. add 0x40 to it (= 64 in decimals, for 64 bytes)
  3. push 0x40 (= the location of the free memory pointer again) onto the stack
  4. update the free memory pointer with the new value via MSTORE
1*oCJoqPaKujn2bIgVBOA1Vg

Free memory pointer in Assembly

In inline, Assembly, the free memory pointer must be handled with care!

Not only it has to be fetched manually, but it also has to be updated manually!

Therefore, you must be careful when handling memory in assembly. You must ensure to always fetch the free memory first in assembly, and write to the location in memory pointed by the free memory pointer, if you don’t want to end up overwriting something in memory that already had some content.

Once writing in memory, you must ensure that you update the free memory pointer with a new free memory offset.

In conclusion, when it comes to the free memory pointer, always remember OpenZeppelin suggestion:

“You must be very careful when operating with memory at assembly level. Otherwise, you could overwrite a reserved space.”

NB: it might not be a good practice to write to the free memory pointer before checking what is actually stored at the memory location pointed by the free memory pointer first.

Example: solidity-bytes-utils library from Gonçalo Sá

Let’s look at this popular Solidity library used to manipulate bytes. If you look carefully at the initial assembly code of each function, you will see that loading the free memory pointer is the first thing happening.

At the end of the function, tempBytes is returned. At the low level, this can be translated by “return what is present in memory at the memory offset pointed by tempBytes“.

1*nVQ2zpD UrHsVd9lC5w9rg
source: GBSPS/solidity-bytes-utils on Github, BytesLib.sol

We use this statement every time in Solidity when we have to pass an argument of dynamic or complex type to a function.

For example in the ENS contract, the claim(...) function from DNSRegistar.sol takes two arguments: a name and proof , both memory references.

1*o Y3Gia6gLULOOURLpHc0g

But what does a memory reference as a function parameter translate to for the EVM? Let’s use a basic Solidity example.

function test(string memory input) public {
// ...
}

When a memory reference is passed to a function as a parameter, the EVM bytecode of the function performs 4 main steps in sequence:

  1. Load the string offset from the calldata to the stack: to know where the string starts inside the calldata.
  2. Load the string length onto the stack: will be used to know how much data to copy from the calldata.
  3. allocate some memory space to move the string out of the calldata into memory: this is the same as described in the “free memory pointer”.
  4. transfer the string from the calldata to memory using the opcode calldatacopy.

I have put the detailed opcodes below. You can also look on my Github repository to understand more in details.

1*rLohdDYp4Jx LW6o2dPteg
1*qGsUCN6HrWlaVCPoVjyk2g
source: All About Solidity — Memory (Github repository)

Let’s look at the following barebone example.

function test() public {    uint256[] memory data;}

The question to ask is what does the variable data contain?

It might be tempting to answer “an empty array of uint256 number”. But don’t be fooled or mislead by the syntax. This is Solidity, not Javascript or Typescript!

In Typescript, declaring a variable with a uint256[] type without initializing it would result in the variable to hold an empty array in the first place.

However, the keyword memory changes it all here!

Let’s refresh our brain, in the introduction article “All About Data Locations”, we describe that variables with the keyword storage , memory or calldata are called reference type variables.

So when you see a variable inside a Solidity function with the keyword memory , you are dealing with a reference to a location in memory.

Therefore, the variable data above does not hold an array, but instead holds a pointer to a location in memory. The Solidity docs describe this well:

Local variables that refer to memory evaluate to the address of the variable in memory not the value itself.

And Solidity goes even further in the explanation!

Such variables can also be assigned to, but note that an assignment will only change the pointer and not the data

Let’s look at another example to better understand.

function test() public pure returns (bytes memory) {    bytes memory data;
bytes memory greetings = hex"cafecafe";
data = greetings;
data[0] = 0x00;
data[1] = 0x00;
return greetings;}

One might thing that the variable greetings is safe here, and that this function will return 0xcafecafe. But wrong assumption here, if you run this function, it will return you the following.

the surprise and wrong assumption that memory references give.

In reality, what happened under the hood is that we create two pointers to memory, named by the variables data and greetings.

When we do data = greetings , we think we are assigning the value cafecafe to the variable data. But we are not assigning anything at all here! We are giving the following instruction to the EVM:

“variable data, I order you to point to the same location in memory that the variable greetings point to!” 🫡 🪖

Allocating new elements in memory

We saw in the previous section that we can allocate some space in memory for variables and write directly into it by assigning a value to the variable.

We can also allocate some space in memory but not write to the memory immediately by also using the new keyword.

This is mainly when instantiating complex types like arrays inside functions.

When arrays are created with the new keyword, the array length must be specified in parentheses. Only fixed size arrays are allowed in memory inside a function body.

uint[] memory data = new uint[](3);

For structs, the new keyword is not required.

Copying from a storage reference variable

Let’s continue with the following Solidity example.

// SPDX-License-Identifier: UNLICENSED
pragma solidity ^0.8.0;
contract Playground { bytes storageData = hex"C0C0A0C0DE";
function test() public {
bytes memory data = storageData; }}

In this scenario, we are copying a storage reference (= right side of the = sign), to a memory reference (= left side of the = sign). Two things happen here:

  1. new memory is allocated, and the variable data will point to a new location in memory.
  2. The hex decimal value 0xC0C0A0C0DE is loaded from the storage and copied into memory at the memory location pointed by data.

For more details about the memory expansion cost, read evm.codes

The Solidity docs state the following:

Memory is expanded by a word (256-bit), when accessing (either reading or writing) a previously untouched memory word (i.e. any offset within a word).

At the time of expansion, the cost in gas must be paid. Memory is more costly the larger it grows (it scales quadratically).

In fact, the memory is said to “expand” every time we write a new word in memory that has not been previously used (had some data in it) or accessed (via mload).

Why does memory expansion matter? Because the larger memory grows, the more gas it will consume each time you interact with it.

When you write to memory via mstore (or mstore8), some gas is used for these two opcodes. But the gas cost for writing to memory does not only depend on how much data you are writing to memory. It also depends on the actual memory size, what is known in the EVM shadow developers community as the “memory expansion cost”.

In addition to the cost of writing to memory, there is always an additional cost related to how much the memory expand.

The memory expansion cost increases in the following way:

  • linearly for the first 724 bytes.
  • Quadratically after that (explain what “quadratically” mean)

The memory expansion cost also increases with simple memory read operations when higher offsets are accessed in memory via the mload opcode.

There is an important concept to be aware of regarding the EVM Memory and smart contracts. The Solidity docs well state this:

… a contract obtains a freshly cleared instance (of memory) for each message call.

This helps us to understand one of the main feature of the EVM Memory: a clear instance of memory is obtained between external calls.

In fact, an instance of the EVM memory is specific to each contract and to the current execution context. This mean that a freshly cleared and empty memory is obtained on each new contract interaction.

Let’s examine in practice how a clear memory instance is obtained on each new external call. We will use these two contracts as an example:

Using these two basic contracts, we can use the Source contract to interact with a Target contract. Let’s deploy and debug them in Remix.

  1. Open the Remix IDE, create a new file a copy the Solidity code above.
  2. Compile the file without the optimizer enabled or any number of runs.
  3. deploy the Target contract first.
  4. deploy the Source contract second, giving the address of Target contract previously deployed as a constructor argument.
  5. on the Source contract, run the function callTarget()
  6. on the console, click on “Debug” to debug each opcodes of the transaction.

As you debug and go through each opcode, you should see that the EVM memory is filled with data at various offsets. One of them in particular at offset 0x80 shows the value 0x826926790000000000000000... . This is the function selector of the function doSomething() on the Target contract.

We can see here that before the external contract call, the memory is filled with data

We can see in the screenshot above the execution context. The debugger has highlighted the code line nb 12, the external call target.doSomething().

Pay now attention to the next step! If you click on the blue arrow button to jump to the next opcode to debug, like magic, the memory is cleared and becomes empty!

1*QwZ5EXmQBmD1VjJX6MZ5kQ
Look at the memory toggle stating “No data available”

As you can see from the screenshot above, the “Memory” field on the left sidebar now display “Not data available”. What just happen?

The CALL opcode made the EVM change the execution environment. We are now running the EVM in a new execution context: the context of the Target contract. As you can see from above, the function doSomething() is now highlighted, providing an additional clue of this new execution context switching.

Below is a summary of the opcodes for this external call in Solidity. I have omitted some opcodes for brevity and explained what happen as comments.

As a brief explanation, the EVM will generate the calldata byte pushing the function selector of doSomething() (being 0x82692679) on the stack and shift it left to prepare the calldata, so to have these 4 bytes as the function selector in the calldata.

The calldata payload to be sent is then stored in memory at the location retrieved by the free memory pointer.

Finally, the CALL opcode will call the external contract address, fetched initially from the contract storage (at instruction number 057) and will send along the calldata by fetching it from memory (where it was previously written to).

You can look at the full snippet of the EVM opcode for this external call in the “All About Solidity” github repository.

Memory in the EVM is an important area to learn. It enables the EVM to perform message calls such as the standard call , staticcall and delegatecall. The calldata and payload sent alongside message calls is stored and retrieved from memory.

Therefore, the EVM memory allows for better composability, enabling to create flexible internal functions and subroutines in smart contracts. Moreover, parameters defined as memory enable contracts to receive calls and arguments from various sources, both from EOAs and external contract calls (loading the payload from the calldata to memory), but also enables to compose inputs directly from within internal functions.

Finally, memory should be dealt with carefully when used in low level assembly. This is to ensure that you will not override some reserved memory space that already contains some data. It is therefore your responsibility to respect Solidity memory management.

The Solidity language also provides the keyword “memory-safe” to use inline assembly more safely, and respect the Solidity memory model.

See the Conventions section in the Solidity docs for more details.





News Credit

%d bloggers like this: