SECRET OF CSS

Does Variable Name Length Affect Python Code Speed? | by Nicholas Obert | Aug, 2022


A deep dive into Python internals to find out how size matters

0*qQ9 wHOMGnId79OT
Photo by charlesdeluvio on Unsplash

If you’ve been working on code optimizations for a while, you may have come across the following question: does the length of the variable names affect the code runtime performance in any way?

At first, you may think that using three-character variable names won’t be much different than a more descriptive name in terms of execution speed. Also, nobody would make their code impossible to read just to lose a few milliseconds.

But then you think about it some more and start asking yourself: “But really, does variable name length affect code performance?”. This is a legitimate and really interesting question.

When a programming language source code gets compiled into machine code, the concept of variable start to lose its meaning. All that’s left of objects, classes, functions, and all these high-level concepts are just memory addresses, jump instructions, and simple operations. In particular, variables are translated into memory addresses and offsets.

Python source code, however, isn’t generally compiled directly into machine code but instead interpreted by CPython, the default Python interpreter. To be more precise, CPython compiles source code into byte code before executing it. During this compilation, however, it still keeps a reference to variable names. Otherwise, Python wouldn’t be able to print out those nice and informative error messages like this one:

Traceback (most recent call last):
File "/home/nic/Desktop/er.py", line 7, in <module>
foo()
File "/home/nic/Desktop/er.py", line 4, in foo
a
NameError: name 'a' is not defined

The first step to figuring out whether variable names affect code performance is to write the code and collect some data. For this test example, I decided to create two simple functions: one uses a one-character variable, while the other function uses an extremely long name of 4014196 characters (4MB).

The benchmark is performed using the built-in timeit module. In the code snippet below, I substituted the 4MB-long name with a shorter descriptive one, so you don’t have to download the whole 8MB gist.

It turns out that the benchmark results are very inconsistent. Sometimes one function is faster; sometimes the other, as you can see below:

Short time: 0.08893431029973727
Long time: 0.07757537399993453

Short time: 0.08859034220022295
Long time: 0.07780963819968748

Short time: 0.08343082869996579
Long time: 0.10756722440019076

Short time: 0.08072503170042182
Long time: 0.07815180819961824

Short time: 0.07780468439996184
Long time: 0.07831483780028066

Short time: 0.10425543720011773
Long time: 0.09920399159964291

So, if there were any runtime performance differences, an extreme test case like this would have pointed them out. However, don’t stop reading yet: the interesting part is coming now. The long variable name is still there, so it must make a difference somehow.

Analyzing the byte code

To dive deeper into the topic, let’s separate those two functions into different files:

And compile them into byte code by importing them as modules with

python3 -m <file_name>

This will generate a __pycache__ folder in the current directory containing the compiled Python code. If we inspect the generated files with a hex editor, we can see that the variable name has survived the compilation and is present in the byte code.

However, the variable name is only mentioned once in the byte code, as you can see here:

Byte code generated for the “short_name.py" file

This means that CPython still keeps the variable name but uses another method to reference it as compiled languages do. This means we must dive deeper to understand what’s happening under the hood.

Disassembling the byte code

Now, we could either reverse-engineer the byte code to discover how the variable is referenced, or we could use Compiler Explorer to easily disassemble it by just copy-pasting the Python source code neatly.

Don’t worry if you don’t understand assembly languages. I will thoroughly comment on every line so anyone can grasp what’s happening.

If this is your first time reading an assembly language, you may need to take more time to read the comments to understand what each instruction does thoroughly.

From what you can see, Python loads the short_name variable multiple times in the program. So, how can we see it only once in the byte code? If you pay close attention to the numbers to the right when short_name is referenced, you can see that they are always 0. This number is the object address in the local scope and refers to it instead of its variable name.

2 STORE_FAST                0 (short_name)
16 LOAD_FAST 0 (short_name)
22 STORE_FAST 0 (short_name)

Note that the variable name in parentheses is only added by Compiler Explorer for better readability and is not part of the byte code.

We can further prove that the rightmost number is the object address by adding other variables in the local scope to see how the number changes accordingly. I will show only the relevant assembly lines in the following code snippet:

We know that Python uses the variable name only once and uses its scope address to reference it. But then, how does the variable name length affect code execution speed?

Loading and parsing

In the previous benchmark with timeit, we have proved that the variable name length doesn’t correlate with the runtime speed. However, that’s because the source code was already compiled into byte code.

The speed disadvantage of long variable names lies in the loading and parsing steps before code execution. To check this, I’ve tested the execution speed of the two separate Python files I’ve shown before, short_name.py and long_name.py, using the Unix command time that measures the time a command takes to execute.

$ time python3 short_name.pyreal 0m0,032s
user 0m0,028s
sys 0m0,004s
$ time python3 long_name.pyreal 0m0,173s
user 0m0,148s
sys 0m0,024s

I then repeated the test multiple times to ensure no noise in the data.

As you can see, short_name.py takes significantly less to execute than long_name.py. This is because Python needs to load the file into memory first. Secondly, it has to compile the source code into byte code, and only then can it execute the program. This means the larger the file, the longer it takes to process it.

The difference in the benchmark results is only due to the hugely different file sizes, as you can see here:

$ ls -lh-rw-r--r--  1 nic nic 7,7M  8 ago 11.05 long_name.py
-rw-r--r-- 1 nic nic 127 8 ago 11.11 short_name.py

long_name.py takes nearly 8MB of memory while short_name.py is only 127 bytes, no wonder that the latter is significantly faster to run. However, this is an extreme case, and you won’t ever have to use variable names so long that tangibly alter the loading and compilation speed.

To wrap it up, variable name length can affect a Python program’s speed as larger files require longer processing times, as shown in the Unix time benchmark. However, once the source code is compiled, the performance difference is less than the random noise caused by other unrelated factors like other processes running simultaneously, as shown in the timeit benchmark, and thus is completely negligible.

You should never sacrifice code readability for that initial speed advantage of just a few milliseconds. If you’re concerned about memory (both disk and RAM) utilization due to big program sizes, you may want to look into minification, the process of removing all unnecessary characters from the source code without changing its functionality (like those .min.js files you see in web development).

I hope you enjoyed this article. If you have anything to add, please share your thoughts in a comment. Thanks for reading!

If you want to learn more about Compiler Explorer, the tool I used to disassemble Python source code, check out this story below:



News Credit

%d bloggers like this: