Ever wondered why Node.js single thread? Concurrency vs. Parallelism, Race Conditions, Deadlock, Starvation, Locking (Mutex), Semaphore, Atomic, compare-and-swap CAS, and more.
(For a better understanding, I recommend a quick recap about what is Thread and Process from this article: Difference between Process and Thread)
A Thread is the smallest unit of execution within a process. Each process starts with a single thread, often called the primary thread, but can create additional threads from any thread.
A multithreaded process has multiple threads within a single process, each having its own program counter, stack, and set of registers, but sharing common code, data, and open files.
Motivation for using the multithreaded process
Threads are very useful in modern programming whenever a process has multiple tasks to perform independently of the others. This is particularly true when one of the tasks may be blocked, and it is desired to allow other tasks to proceed without blocking.
For example a web server, multiple threads allow for multiple requests to be processed simultaneously, without having to serve requests sequentially or to fork off separate processes for every incoming request. (The latter is how this sort of thing was done before the concept of threads was developed). A daemon would listen at a port, fork off a child for every incoming request to be processed, and then go back to listening to the port).
One thread can still continue to run while other threads are blocked or busy processing intensive calculations.
- Resource sharing
As we learned about the design of threads from the previous article, threads share common code, data, and other resources, which allow communication between threads easily.
Creating and managing threads (and context switches between them) is much faster than performing the same tasks for processes.
- Scalability, i.e. Utilization of multiprocessor architecture
A single-threaded process can only run on one CPU, no matter how many may be available, whereas the execution of a multi-threaded application may be split among available processors.
For a multithreaded process, it’s running concurrently or in parallel?
To answer this question, first, let’s understand what is the difference between these two terms.
For example, we have a complex problem and break it down into smaller pieces. Executing those smaller problems simultaneously is parallelism.
On the other hand, concurrency is all about the ability of dealing (not do) with many things at once. It’s the way we structure, breaking the problem down into independent components, which enables it to potentially run in parallel. Once we break down the problem, we can execute those problems with or without parallelism. Parallelism is not the goal of concurrency, concurrency goal is a good structure.
If you run an operating system, it might have a mouse driver or a keyboard driver, display driver, network driver, or whatever else. Those are all managed by the OS as independent things inside the kernel. But those are concurrent things, they aren’t necessarily parallel. If your CPU only has one processor, only one of them is ever running at a time.
Multithreading on a single processor gives the illusion of running in parallel. In fact, the processor is switching execution between threads by using a scheduling algorithm, resulting in concurrent execution. At a given time, only one thread is making progress. The switching between threads happens quickly enough that the threads might appear to run simultaneously.
… This context switching usually occurs frequently enough that users perceive the threads or tasks as running in parallel (for popular server/desktop operating systems, maximum time slice of a thread, when other threads are waiting, is often limited to 100–200ms).
(Thread (computing) — wikipedia)
In the same multithreaded process on a multiprocessor environment, each thread can run concurrently on a separate processor, resulting in parallel execution, which is truly simultaneous execution. When the number of threads is less than or equal to the number of processors available, the operating system ensures that each thread runs on a different processor.
How Node.js can handle concurrent requests with a single thread?
Let’s step back a little bit and ask ourselves. With all the benefits of multithreading, why does Node.js use a single thread and the fact that it only processes one request at a time?
First, we should understand what most web applications/services do. For a typical web application with a database, the flow of processing a request looks like this:
user do an action
application start processing action
└──> make database request
└──> do nothing until request completes
└──> send result to user
The part that requires active attention is called CPU active time because it requires the CPU to actively spend time thinking and computing results. It requires a thread to handle it.
The part that doesn’t require active attention is called IO (Input Output) because it involves waiting for something else to provide input or to send output. Examples of IO include waiting to read a file from the file system, waiting while making a network request to another server, or even just waiting for time to pass.
Imaging in a multi-threaded web server, we have 100 concurrent requests and the web server spawns 100 threads for each request. While those threads might have to wait for 100 replies from the database, the CPU active time is zero. The parallelism of IO is not bound by threads therefore 100 threads are just as efficient as 1 thread. Instead, IO parallelism is bound by IO devices, e.g: the number of network cards. So if we increase the number of threads, the performance might not increase as we expected.
What Node.js does is that while the request is waiting for IO, it saves time by switching to handle other requests. This is what Node.js called non-blocking IO. Node.js does not wait for a request to complete before processing all the other requests. This means by default all requests in Node.js are concurrent, they don’t wait for other requests to finish.
What about the IO part? You might ask. Well, Node.js’s internal library called
libuv will handle the IO. Under the hood of
libub is actually a pool of threads, each IO request is handled by a thread in the Thread Pool.
In conclusion, choosing a webserver should depend on what kind of request we are dealing with. A Node.js app that isn’t doing CPU-intensive operations can run thousands of concurrent connections with better performance than a thread-based server.
With multi-threaded programs, multiple threads can run the same piece of code concurrently or in parallel. It can be run in a “Thread Safe” or “Not Thread Safe” way. Next, we will figure out what are those problems and how we can avoid them.
Thread safety is a concept that means different threads can access the same resources without producing unpredictable results like a race condition or a deadlock.
Let’s see an example where we want to implement a Singleton pattern on initializing a database connection. Simply put, initialize a database connection object only once and use that object whenever we want to interact with the database.
A critical section is any piece of code that has the possibility of being executed concurrently by more than one thread and sharing data or resources used by the application. The above code worked but it has a race condition at line 10.
Assuming multiple threads run at the same time on line 10, threads “race” to read
instance variable, it does nil at that time and then multiple threads initialize connection with the database, which might consume all the connection pool of the database.
Output in a race condition is hard to predict and inconsistent.
There are several methods for avoiding race conditions to achieve thread safety. The first method is Locking or Mutex.
Mutex as the name Mutual exclusion, only one thread has exclusive access permission and blocks others from access to the resource. It allows all the threads to be able to use the resource but only one process is allowed to use at a time.
For example, Thread A runs in line 12, it tries and successfully acquires a lock, then it goes on line 17 and creates a singleton database connection object. Meanwhile, when Thread B runs on line 12, it has to wait to acquire the lock since Thread A is holding it. After Thread A returns in line 20, it releases the acquired lock in line 13 (
defer keyword will move the execution of the statement to the very end inside a function).
Until then, Thread B can successfully acquire the lock in line 12 and check if
instance variable is
nil, since it has already been assigned by Thread A, it won’t initialize the singleton object again. Then it releases the acquired lock.
There is a more clean way in Golang that is using
See also: sync.Once implementation
I found it very interesting when reading the reason why they don’t do compare-and-swap (CAS) to check if the action has been finished or not. About CAS we will discuss it below.
When acquiring a lock, we must carefully consider whether a deadlock could happen.
Deadlocks happen when two or more threads aren’t able to make any progress because the resource required by the first thread is held by the second and the resource required by the second thread is held by the first.
For example, a multi-threaded program needs to write to 2 resources. Thread 1 is writing to Resource 1 and it blocks other threads from writing to Resource 1, then it wants to write to Resource 2 but it’s being blocked by Thread 2. Meanwhile, Thread 2 is waiting for Resource 1 to be released but Thread 1 is waiting for Resource 2 to be released and deadlock is happening.
To detect and avoid deadlock, the OS must be given in advance additional information about which resources a process will request and use during its lifetime. Deadlock avoidance algorithm analyzes each and every request by examining that there is no possibility of deadlock occurrence in the future if the requested resource is allocated.
Other than a deadlock, a program thread can also experience starvation, where it never gets access to the wanted resource to process its work. Starvation can be caused by errors in scheduling. For example, if a (poorly designed) multi-tasking system always switches between the first two tasks while a third never gets to run, then the third task is being starved of CPU time.
“Atom” comes from geek “atomos” = “uncuttable”, which means “indivisible smallest unit”. In operating system, an atomic operation is one that effectively happens all at once or it doesn’t happen at all. It can’t stop in the middle and leave an inconsistent state. No side effects of an atomic action are visible until the action is complete.
In concurrent programming, atomicity is equivalent to linearizability, which has the additional property that none of its effects are visible until after it completes. That is, there are no intermediate states visible to other threads. (In database systems, this property is classified separately as “isolation”.) Atomic operations run by multiple threads will always appear to occur one after the other, sequentially.
Most processors provide an atomic compare-and-swap (CAS) instruction that reads from a memory location, compares the value with an “expected” one provided by the user, and writes out a “new” value if the two match, returning whether the update succeeded.
Here is a quick playground: https://go.dev/play/p/RTEv3UtGBYx
For thread that succeeds swap the
atomicinz variable from 0 to 1, only that thread can make the singleton, other threads failed to swap the variable, then it keeps running in the loop until the singleton is available.
Other than CAS we can also use Load and Store a pointer value.
See also: sync/atomic documentation.
Blocking/Lock vs Non-Blocking/Lock-Free algorithms
We know from the previous article about process/thread that thread is allowed to compute in CPU for a certain amount of time, then it got suspense and swap out from CPU for another thread to do its work.
For OS level locking operation, after a thread got a lock, what happens if the CPU time is up for that thread and it got suspended? Well, all other threads will just sit there and wait for the lock to be released, the thread that got the lock is not making progress since it’s being scheduled. Also, when the thread is being swapped in/swapped out to execution, there is a context-switching overhead of reloading/saving the thread state.
On the contrary, threads doing atomic operations don’t wait and keep trying until success, there is no context-switching overhead. If the OS suspense a thread, other threads can still keep going, when the thread is restored, it just loads the new value and tries atomic operation again.
In general atomic operations are faster if contention between threads is sufficiently low and if you ever using locking operations, be sure to know what kind of lock you are using. It’s better not to use OS-level locking or Locking using CAS (the code example above) so that the thread doesn’t get suspended.
In this article, we’ve covered lots of multithreading topics:
- Pros/cons of Multithreaded programs
- The difference between Concurrency and Parallelism,
plus the reason why Node.js is single-threaded
- Problems with Multithreaded programs: Race Conditions, Deadlock, Starvation
- How to achieve Thread-safety: Locking (Mutex), Semaphore, Atomic operation e.g. compare-and-swap (CAS)
- Blocking/Lock vs Non-Blocking/Lock-Free algorithms
If you got to this point, thanks for reading. Hopefully that you found this helpful and you learned something new!