SECRET OF CSS

Solidity Data Locations: Understanding Storage | by Jean Cvllr | Jul, 2022


Diving into storage references and smart contract storage layout.

0*wQ8qcu JCpelHtf9

This is Part I of the “All About Data Locations” sub-series.

In today’s article, we cover in more detail an important data location in the EVM: the smart contract storage.

We will see how the layout of the contract storage works, storage references. how to write to the contract storage in assembly. We will also use some contracts from OpenZeppelin and Compound to learn how storage references work in practice while learning the Solidity code behind these popular contracts and protocols along the way.

  • Introduction
  • Layout of Storage
  • Basics of Storage
  • Interacting with Storage
  • Storage pointers in function parameters
  • Storage pointers in function body
  • The cost of reading storage.
  • Conclusion

Understanding the storage model in Ethereum and EVM-based chain is essential for good smart contract development.

The only place where you can store data permanently on a smart contract so that it can then be accessed for future executions is in its storage. Each smart contract maintains its state in its own permanent storage. It acts like “a mini-database for a smart contract”, but unlike other databases, this database is publicly accessible. All the values stored in the smart contract storage are available for external reading for free (via static calls), without the need to send a transaction to the blockchain.

Writing to the storage, however, is quite expensive. Actually, it is the most expensive operation in the EVM when it comes to gas cost. The content of the storage can be changed with sendTransaction calls. Such calls change state. This is the reason why contract-level variables are referred to as state variables.

One important thing to remember is that by design in Ethereum and the EVM, a contract can neither read nor write any storage apart from its own. The only way contract A can read or write from the storage of another contract B is when contract B exposes functions that enable it to do so.

The storage of a smart contract is a persistent read-write data location. Meaning that if data is written in the contract storage in one transaction, it will persist once the transaction is completed. Reading the contract storage after this transaction will retrieve the data that was written/updated by this previous transaction.

Each contract has its own storage, that can be described and bound to the following rules:

  • Hold state variables
  • Persistent between transactions and function calls
  • Reading is free, but writing is expensive
  • Contract storage is pre-allocated during contract construction.

Variables that reside in storage are referred to as state variables in Solidity.

The only thing you should remember about contract storage is:

Storage is long-term and expensive!

Saving data into storage is one of the operations that requires the highest amount of gas in the EVM.

What is the actual cost of writing to storage?

The cost is not always the same, and calculating the gas of writing to storage is fairly complicated formula, especially since the latest Ethereum 2.0 upgrade).

As a simple summary, the cost of writing to storage is as follows:

  • initializing a storage slot (for the first time, or if the slot does not contain any value) from a zero to a non-zero value costs 20,000 gas
  • editing the value at a storage slot costs 5,000 gas
  • deleting the value at a storage slot give a refund of 15,000 gas

Is reading contract storage really free?

The storage of a smart contract is free to read externally (from an EOA). In such a case, no gas has to be paid.

However, gas has to be paid if the read operation is part of a transaction that modifies the state on the contract, another contract, or on the blockchain.

Can a contract read other contracts’ storage?

By default, a smart can only read in its own storage (we will later via SLOAD) during the execution environment. But a smart contract can also read the storage of other smart contracts if such contracts expose functions in their public interface (ABI) that enable to read data from specific state variables or storage slots.

As explained by OpenZeppelin in their EVM in-depth part 2 article, the storage of a smart contract is a word-addressable space. This is opposite to memory or call data which are linear data locations (growing bytes arrays), where you access data through offsets (indexes in the bytes array).

In the contrary, the smart contract storage is a key-value mapping (= database), where the key corresponds to a slot number in the storage, and the value is the actual value stored in this storage slot.

The storage of a smart contract is made of slots, where:

  • Each storage slot can contain words up to 32 bytes long.
  • Storage slots start at position 0 (like array indexes)
  • There are 2²⁵⁶ storage slots available in total (for read/write)

In summary:

A smart contract’s storage consists of 2²⁵⁶ slots, where each slot can contain values of size up to 32 bytes.

Under the hood, the contract storage is a key-value store, where 256 bits keys map to 256 bits values. All the values at each storage slot are initially set to zero, but can also be initialized to non-zero or some specific value the constructor during contract deployment.

The contract storage as shelves in a hangar

In his article, Steve Marx describes the storage of a smart contract as “an astronomically large array, initially full of zero, where the entries (indexes) in the array the contract’s storage slots.”.

What would that look like in the real world? How can we represent the storage of a smart contract with something that we are probably most familiar with?

The layout of a contract’s storage is quite similar to the hangar of a building-supply dealer.

I invite you to visit your local building-supply dealer! You will get a good representation of what the storage of a smart contract looks like.

If you go to the outdoor part in particular (where trucks and vans go to buy bags of cement, pallets of bricks, or rails of steel), you will notice quite a lot of activity. Clarks are driving around, everywhere, and really fast, taking things out from the shelves. This is equivalent to what the EVM does when reading a state variable:

contract Owner {address _owner;    function owner() public returns (address) {
return _owner;
}
}

In the contract above, there is only one shelf (= one slot). The EVM loads the variable from the “shelve 0” and unloads it (onto the stack) to present it to you.

Layout of state variables

The lead Solidity developer chriseth describes the storage of a contract as follows:

“You can think of storage as a large array that has a virtual structure… a structure you cannot change at runtime — it is determined by the state variables in your contract”.

From the example above, we can see that Solidity assigns a storage slot to every defined state variable of your contract. For statically-sized state variables, the storage slots are assigned continuously, starting from slot 0, in the order that the state variables are defined (the rules are dif

What Chriseth means here is that “storage cannot be created in function calls”. In fact, this would make little sense to create new storage variables at new storage slots through a function, if it has to be persistent (the case of mapping is slightly different however).

The storage of a smart contract is laid out during contract construction (at the time the contract is deployed). This means the layout of the contract’s storage is set in stone at the time of contract creation. The layout is “shaped” based on your contract-level variable declarations, and such layout cannot be changed by future method calls.

Let’s see the actual storage layout of the previous contract using the solc command line tool. If you run the command below:

solc contracts/Owner.sol --storage-layout --pretty-json

You will obtain the following JSON output:

======= contracts/Owner.sol:Owner =======
Contract Storage Layout:
{
"storage":
[
{
"astId": 3,
"contract": "contracts/Owner.sol:Owner",
"label": "_owner",
"offset": 0,
"slot": "0",
"type": "t_address"
}
],
"types":
{
"t_address":
{
"encoding": "inplace",
"label": "address",
"numberOfBytes": "20"
}
}
}

From the JSON output above, we can see a storage field that contains an array of objects. Each object in this array refer to a state variable name. We can also see that each variable is mapped to a slot and has an underlying type.

This means that the variable _owner can be changed to any valid value of the same type (address in our case). However, slot 0 is reserved for this variable and will always be there.

Let’s now have a look at how state variables are laid out in storage (see the Solidity documentation for further understanding).

Consider the following Solidity code:

pragma solidity ^0.8.0;contract StorageContract {

uint256 a = 10;
uint256 b = 20;

}

All statically-sized variables are placed into storage slots sequentially, in the order they are defined.

Remember: each slot in storage can hold up to 32 bytes long values.

In our example above, a and b are 32 bytes long (since their type is uin256). Therefore, they get assigned their own storage slot.

Packing state variables in one storage slot.

Nothing exceptional in our previous example. But let’s now consider the scenario where you have several uint variable of different sizes, as follow:

pragma solidity ^0.8.0;contract StorageContract {

uint256 a = 10;
uint64 b = 20;
uint64 c = 30;
uint128 d = 40;

function readStorageSlot0() public view returns (bytes32 result) { assembly {
result := sload(0)
}
} function readStorageSlot1() public view returns (bytes32 result) { assembly {
result := sload(1)
}

}

}

We have written two basic functions to read the contract storage slots at a low level. Looking at the outputs, we obtain the following:

1*9RwKZ8Ho4TiJzyAZ27N9Zw

The Solidity documentation state that:

“multiple, contiguous items that need less than 32 bytes are packed into a single storage slot if possible…

The first item in a storage slot is stored lower-order aligned.”

So when variables are less than 32 bytes, Solidity try to pack more than one variable into a storage slot if they can be fit. As a result, a storage slot can hold more than one state variable.

If an elementary type does not fit the remaining space available in a storage slot, it is moved to the next storage slot. For the following Solidity contract:

pragma solidity ^0.8.0;contract StorageContract {

uint256 a = 10;
uint64 b = 20;
uint128 c = 30;
uint128 d = 40;

}

The layout of its storage would look like this:

1*gkgR4uA9GlPhI7IVARMoXg
1reading values at storage slot 0
1*7zJTVrQzr3odEfg1XpWlug
reading values at storage slot 1
reading values at storage slot 2

Let’s look at a more concrete example from the OpenZeppelin library: the VestingWallet contract. This contract once deployed with a beneficiary address and some configurations will allow accepting Ether or ERC20 tokens.

The contract can then release these vested tokens to the beneficiary address once the vesting schedule has passed.

Let’s deploy our own Vested Wallet on Remix and analyze the storage slots.

If we look at the top state variables defined, we can see that the configuration values beneficiary and start timestamp are packed into the same storage slot because of their type.

1*p4Out5oMANYVO EUbRnM A
source: VestingWallet.sol

Storage Layout with inheritance

The layout of contract storage is also based on inheritance. If a contract inherits from other contracts, its storage layout follows the inheritance order.

  • state variables defined in the most base contract start at slot 0.
  • state variables defined in the following derived contract are placed in sub-sequential slots (slot 1, 2, 3, etc…).

Also, note that the same rules as packing state variables in one storage slot apply. If possible through the inheritance, state variables from different parent and child contracts do share the same storage slot.

The EVM provides two opcodes to interact with the storage: SLOAD to read, and SSTORE to write to the storage. Both of these opcodes are only available in inline assembly. Solidity converts writing to state variable to these opcodes under the hood after compilation.

Reading from storage

The EVM can read the storage of a smart contract using the SLOAD opcode. SLOAD loads a word from storage into the stack.

The SLOAD opcode is available in inline assembly. It can be used to easily retrieve the whole word value stored at a specific storage slot.

function readStorageNb(uint256 slotNb) 
public
view
returns (bytes32 result)
{
assembly {
result := sload(slotNb)
}
}

This is what solidity does under the hood. When reading state variables via a getter function, it will automatically use the SLOAD opcode. For instance the popular name() or symbol() functions from ERC20. These functions do nothing else apart from returning the state variable. See the screenshot below from OpenZeppelin.

1*EPdhX wRplZzW1EhmSHVAg
source: OpenZeppelin Github repository, ERC20.sol

If you query the name() function in Remix and you debug the getter, you will obtain the following opcodes below.

; name()
JUMPDEST
PUSH1 60
PUSH1 03 ; step 1 - push the number 3 on the stack (= slot nb 3)
DUP1
SLOAD ; step 2 - pass the number 3 as argument to SLOAD to
; load the value stored in the storage slot nb 3
; (where the `_name` variable is stored)
; rest of the opcodes are emitted for brevity

Writing to storage

The EVM can write to the storage of a smart contract using the SSTORE opcode. SSTORE saves a word to storage.

Using inline assembly, this would look like this:

function writeToStorageSlot(uint256 slotNb) public {    string memory value = "All About Solidity";        assembly {
sstore(slotNb, value)
}
}

Let’s continue with our previous example of the ERC20 token from OpenZeppelin. We will obtain the following opcodes if we deploy the ERC20 token contract and debug the constructor using Remix.

MLOAD  ; 1. load the token name from memory
PUSH1 ff
NOT
AND
DUP4
DUP1
ADD
OR
DUP6 ; 2. put back 3 (= slot nb for `name`) on top of the stack
SSTORE ; 3. store at storage slot 3 the token `name` parameter
PUSH3 0003ee
JUMP

Try it out on Remix and debug the transaction after deploying the ERC20 token.

This tweet describes well how the opcode SSTORE works in the geth client.

We can see from the source code of the geth client that SSTORE pop two value from the stack, the first topmost loc being the storage location and the second topmost val being the value store in the storage.

We can also see that both value when written to the contract storage via interpreter.evm.StateDB.SetState(...) convert both items taken out from the stack to bytes32 values.

We can therefore see directly from the source code of the geth client what we explained in the section Layout of Storage: the smart contract storage maps bytes32 key to bytes32 values, and therefore everything is treated as bytes32 words under the hood by the EVM.

Here is also a final diagram from the same author of the tweet, explaining the flow of the SSTORE opcode in detail.

0*Uq3u5Zpy0LEbz4pL
source: faheel from Twitter.

The storage keyword can be passed for variable of complex passed as parameters to functions. But how does this work?

Whenstorage is specified in a function parameter, this mean that the argument passed to the function must be a state variable.

Let’s use a very simple example, still continuing on the OpenZeppelin library. This will also help us to better understand the contracts and libraries part of their package.

OpenZeppelin provides a Timers library that can be used to build and deal with timers and timepoints in Solidity contracts. Look at the functions setDeadline(...) and reset(...) below and their parameter.

1*SUb5no GqGHm6lfo6QX og
source: Timers.sol from OpenZeppelin Github repository.

These two functions accept storage pointers only. What does that mean?

Let’s create a TimeWatch contract to understand!

If you try compiling this contract on Remix, the Solidity compiler should complain with the following error:

1*mqMXtKZXYjLwEQ7eY6 1IQ
debugging storage pointer errors 101

This error makes sense. The setDeadline(...) function from the Timers library only accepts storage pointers. This means that the function will accept as arguments:

  • either state variable directly
  • or reference to a state variable (anotherstorage reference, or what I like to call storage pointers).

Let’s then re-write our TimeWatch to make it work. We can also add a reset button to make it work

We have seen a basic example of storage pointers for function parameters. Let’s go a bit deeper with a more complex example to better understand storage pointers in function parameters.

When the parameter of a function is a storage reference, the function can accept either a state variable directly or a reference to a state variable.

Let’s continue building up on our TimeWatch example. We could use the Timers library to build a Race Tournament contract. Using a contract would reduce the level of trust put on the race organizer or any third party that could be potentially mistrusted in cheating the timers and the rules.

Below is a prototype. The contract keeps track of the racers involved and their time via mapping. Pay attention of the startRacerTime(…) function below.

This compiles fine because the racerTimer points to some entry in the mapping of racers (the contract storage). Therefore, since this variable is a reference to the contract storage, the setDeadline(...) function from the Timers library will accept it as a valid function argument.

Assigning a storage variable to a local variable (defined in the function body) always copies when the variable is of elementary type.

However, for complex or dynamic types, the rules are different. You can pass the keyword storage to a value, you don’t want to be cloned.

We describe these variables as storage pointers or local variables of storage reference type.

Any storage referenced variable in a function always refers a piece of data pre-allocated on the contract’s storage. In other words, a storage reference always refers to a state variable.

Let’s use the Solidity code of a very popular Smart Contract Governance protocol, used as a basis for many other governance protocols: Compound.

The GovernorAlpha the smart contract has been influential in building governance protocol. This contract is used as the barebone for the governance not only for Compound but also for Uniswap or Indexed Finance.

Let’s look at one of the core functions of GovernorAlpha . The function propose(...) as suggested by its name, enables to create a new proposal (e.g: change the interest rate of a cToken). If you look below, you will see two examples of what we explained previously:

In line 153, the local variable proposalId is assigned the value of the state variable proposalCount. Since this local variable is of elementary type (a uint), the value is copied/cloned from the contract storage (from the state variable proposalCount) to the local variable (on the stack). Any changes made to the local variable do not propagate to the contract storage.

In Compound, this line is used to save the new proposal id locally (generated by incrementing the proposalCount line 152). This saves some gas too. Look at lines 154 and 157. If instead of proposalId the variable was proposalCount (the actual state variable), this would read the contract storage twice.

line 154: using the new proposalId, a newProposal is created. Since the newProposal variable is a struct (complex type), we have to specify the data location where we want the EVM to operate on when we will manipulate and edit this variable afterward.

This example uses a storage reference.

  • What does that mean? newProposal refers to some place in the contract storage.
  • Which place in the contract storage does it refer to? It refers to a Proposal inside the proposals mapping.
  • Which Proposal? The proposal is referred by proposalId in the mapping.

What does this storage keyword then imply? It will imply that every change made to the newProposal variable will result in writing data to the contract storage. You can see starting from line 157 that all the details of the new Proposal are written via the Proposal struct members one after the other. Each of these lines writes to the contract storage.

Once the function has been executed, the new Proposal will have been saved in the contract storage, and the changes will persist.

You can read and write to the contract storage in inline assembly by specifying a storage slot and storage offset.

We saw previously that some variables in storage do not necessarily occupy a single full storage slot but instead are sometimes packed together.

And we also saw that SLOAD as an opcode accepts only the storage slot number as a parameter and returns the full bytes32 value stored under this slot.

But then how to read a state variable packed among many other state variable in the same storage slot?

Take the following contract as an example:

contract Storage {    uint64 a;
uint64 b;
uint128 c;
}

The Solidity docs explain the following:

For local storage variables or state variables, a single Yul identifier is not sufficient, since they do not necessarily occupy a single full storage slot.

Therefore, their “address” is composed of a slot and a byte-offset inside that slot.

Therefore, the “address” of a variable is made of two components:

  • the slot number: where the variable resides.
  • the byte offset where the variable start (inside that slot).

Let’s continue with some basic assembly code to understand better. Take a look at the following contract and its functions:

contract Storage {    uint64 a = 1;
uint64 b = 2;
uint128 c = 3;
function getSlotNumbers() public view returns(uint256 slotA, uint256 slotB, uint256 slotC) { assembly { slotA := a.slot
slotB := b.slot
slotC := c.slot

}

} function getVariableOffsets() public view returns(uint256 offsetA, uint256 offsetB, uint256 offsetC) { assembly { offsetA := a.offset
offsetB := b.offset
offsetC := c.offset

}

}}

Running these two functions through Remix give the following outputs:

With inline assembly and Yul,

To retrieve the slot pointed to by the variable c, you use c.slot, and to retrieve the byte-offset, you use c.offset. Using c itself will result in an error.

function ReadVariableC() public view returns (uint64 value) {    assembly {
value := sload(c)
}
}
1*G2b5QfqRhT2Wd97uhjgtaQ
The code above will not compile and will throw you the following error

One thing to mention too is that in inline assembly, you cannot assign to the .slot or the .offset part of a storage variable.

function doesNotCompile() public {    assembly {
a.slot := 8
a.offset := 9
}
}
the error reported by the solc compiler (screenshot taken from Remix)

What is the offset for storage pointers in Yul?

In function bodies, some variables can be storage pointers/storage references. For example, this includes struct, array and mapping. For such variables, the .offset will always be zero in Yul because such variables always occupy a full storage slot and cannot be packed tightly in storage with other variables.

The storage of a smart contract is costly to write to, whether it is to initialize or modify the data inside. Although it is free to read data from the contract storage, you should still consider the gas cost associated with reading to the storage of a smart contract if these read operations are part of a transaction that changes state.

Because of the high gas cost of operating on storage, the Solidity docs state an important consideration:

you should minimize what you store in persistent storage to what the contract needs to run.

It is recommended to store certain data outside of the contract storage whenever possible, to minimize the associated gas cost.





News Credit

%d bloggers like this: