Improving Backend Application Performance | by Naveed Khan | Aug, 2022

Strategies to identify and improve the performance of backend applications

1*NkBay3jQUbx0Rd lIiRK9w
NewRelic APM Dashboard

Over the years, I have used a concoction of strategies to improve backend application performance. These range from apps with traffic of a couple of users in a day to thousands of concurrent users.

In my opinion, backend systems should always be designed in a scalable and cost-effective manner regardless of the number of users hitting the servers. This is key to providing a great user experience while not costing a fortune to run the infrastructure. Below are some of the strategies that worked well for me.

Note that I will not be discussing strategies that involve scaling horizontally by adding more servers or vertically by adding more powerful system hardware. If you are interested, check out my post about System Design which covers that topic.

All right, let’s get started.

At its core, application performance improvement work revolves around measuring your application for speed and resource usage. It’s a common misconception that profiling involves expensive and or complicated software tools that not only require a great deal of effort to set up and use.

While this may not be completely untrue for some applications, in my experience, backend apps lend themselves nicely to simple things, such as adding start-to-end time reporting in code, recoding memory usage at different stages of your execution, or using tools like Apache Bench or Unix time command on the terminal to get the data. Here is an example of using the time command to estimate how much time it takes to hit a particular API endpoint:

% time curl -s -output removed for brevity-0.01s user 0.00s system 23% cpu 0.064 total

Establishing these baseline numbers is a very important first step. Without the baseline, it’s hard to tell if the changes you are incorporating are helping or making the situation worse.

In addition, there are several great paid Application Performance Management (APM) tools, such as NewRelic, DataDog, and AppDynamics, that provide detailed insights into historical data.

1*YAoysBs4S4 WuRgLoutoJA
DataDog APM Dashboard

Before signing up for some sophisticated — and in some cases — expensive performance improvement tools, I recommend simply reading and analyzing your code. This is super helpful when it’s hard to get into the actual environment. During the development of Underworld Empire, we ran into a performance issue while an Event in the game was active. Of course, we could not reproduce it in our development environment, but we figured it out by eyeballing the code. This may be hard to do when the code base is very large, use the other tips in this article to help narrow your focus to the relevant parts.

IO is often the root cause of performance degradation than anything else. A simple yet powerful trick is to use the multiple-get function of the same method. Whether you are dealing with file IO or remote data sources, fetching stuff multiple times adds a lot of overhead. Doing a bulk fetch can improve this drastically. Check out the benchmarking done by Rich in his Redis get pipeline vs mget post to see this in action.

I’ve combined these techniques into one tip. The general idea is similar; any processing that can be done in the background while you perform other tasks or starting multiple background tasks in parallel and waiting for them to finish is almost always faster than doing them one by one. Note that multithreading does pose some issues relating to race conditions that may lead to time-consuming debugging sessions. Use this judiciously and always think about the thread safety of not only your own code but also the libraries you are using.

In other words, don’t fetch it twice in your code. Anything fetched from disk or remote source should be saved to an application-level key-value store, which may just be a simple global array with accessors. I won’t be surprised here if using a global variable irks you the wrong way. Personally, I completely condone it when used judicially with wrapper accessors.

The following example shows one possible implementation:

Caching using the app process memory

This technique is extremely useful when dealing with data that does not exist on the same machine where your application is running. Any modern clustered environment can benefit from it. This could be as simple as using a local cache file or something similar to APCu in PHP or by simply using a local files. Important to understand that if the server is recycled, you will lose the cache data.

In this strategy, an external memory store like Memcache and Redis is utilized to hold data, so you don’t have to fetch it again from your database or storage. This is often referred to as the Cache Aside Design Pattern. I have talked about Redis at length in the past. Here is a code example that shows this in action:

Code example of using Redis as a DB cache

When using relational databases, query performance analysis is relatively a straightforward thing. Most RDBs provide ways to do it. Fortunately, there are old established ways of doing business here, and in some cases, creating and using the right indexes is all that’s needed. If you are interested, I recommend reading How to Optimize MySQL Queries for Speed and Performance by the AliBaba cloud team as a starting point.

For non-relational databases managing and creating your own indexes where you can fetch the relevant data directly using a key is much faster when compared with a scan query. Write a comment below if you are interested in this topic, and I will be happy to explain.

When accessing data over the network, whether it’s a client accessing your web server or you accessing a database, network congestion could be a major contributor to latency. Most servers provide this functionality but you may need to turn it on and ask your client to request for it. If you are interested Digital Ocean team has a great tutorial on How To Improve Website Performance Using gzip and Nginx on Ubuntu 20.04

When communicating between two servers, for example, in the cases of microservices where a single server does not implement all functionality, using binary or compact techniques, for example Google’s Flatbuffers is a great solution available for a lot of languages.

I should mention that we had trouble making it work for PHP.

Let me know in the comments if you like to share a strategy that worked for you. If you are interested in similar topics, you may be interested in reading my following posts:


News Credit

%d bloggers like this: