There are so many places and ways to learn to program, we only need to know where to look. Yet, we often neglect legacy code.
You’re about to enter another dimension. A dimension not only of sight and sound, but of mind. A journey into a wondrous land of imagination. Next stop: The Legacy Code.
For those who didn’t recognize it, or didn’t watch the show, this is (altered) intro to The Twilight Zone episode. It is a cult SF/horror TV show. Each episode tells a stand-alone story of characters that find themselves dealing with often disturbing or unusual events leading to the surprising ending.
Our journey through the lines of legacy code could feel quite similar sometimes. But just like in episodes of The Twilight Zone, in every legacy code mystery there is a lesson to be learned.
Programming can be learned in many ways. But no matter how we did it, we didn’t do it on our own. In our journey, there were a lot of books, online courses, and documentation. While going through these materials we stumbled upon a bunch of advice on how things could be done and what should be avoided. All those pieces of advice are an accumulation of experience from the people who wrote those materials, and many others that shared their experiences before them.
This is what I like to call borrowed knowledge. It is not our knowledge, we are “borrowing” it. We are doing things in a certain way, but that doesn’t guarantee that we understand what we are doing.
– We are doing microservice architecture.
– Why? What problem are we trying to solve? What are the benefits? What are disadvantages? Are we really doing microservice architecture or are we doing distributed monolith? Do we need microservices?
This is probably the easiest example. Microservices are lately used like mantra. It is presented as the ultimate solution to all of our problems. Don’t get me wrong, microservice architecture is a beautiful pattern, but it has its own place.
Until we know what that place is, microservices are not our knowledge. It is something that someone else told us to do. And recommending it would be like recommending a movie that we did not watch. We put our trust in someone else’s taste in movies. We are “borrowing” it.
This is not necessarily a bad thing. Take into consideration that we work to live, not live to work. So knowing everything is not the ultimate goal that we should strive for. That is why we have opinionated technologies and coding standards. To show us the way and to eliminate the need to learn things the hard way. We do not need to get hit by a car to learn to look both ways when crossing the street.
Having a structure of a project, a tool or a pattern that we can simply “plug” into our solution without much fuss is great. But what definitely should be our goal is to understand what we are doing, why, and when we should not do it that way.
How does legacy code fit in this story? What is a better way to see why we should or should not do something than seeing it in action? Sometimes code is just a victim of time or circumstances, but in most cases legacy code is a result of bad decisions.
It is harder to notice those bad decisions in the beginning. When we are starting new projects, we are mostly focused on what needs to be done. As time passes and software is used, bad decisions begin to surface. And there is no better teacher than a bad decision.
In software development we should avoid tightly coupling as much as possible. But do we really understand why? And how can we achieve this? Sometimes it is OK to tightly couple things if we know what we are doing.
For example, if we use a certain ORM it is fine to directly use it in business logic… if we know for certain that we are not going to change it. If we are not that certain then we should put a level of abstraction between our business logic and data access layer by using repositories, data access objects or something similar.
If we have ever been in a position where we use an ORM that is out of support, a custom solution that no longer meets our needs, or simply majority of employees do not know how to use it (and there is no good documentation), then we definitely understand why we need that level of abstraction. Changing ORM would be a nightmare, and staying on existing ORM limits us in what we can do. This is a lose-lose situation.
Another example, so characteristic for legacy code, would be code duplication. We all (should) agree that code duplication is a bad thing. Yet there are so many developers that still duplicate code. Sometimes even on a massive scale. Any of us that have ever fixed some bug, and there is still faulty behavior in production because there are two, three, or more copies of the same code containing the bug, know why we shouldn’t duplicate code.
In a new codebase, we might get away with it because there is still not that much code, and we still remember every place that we need to change, but that doesn’t make it less wrong.
All of this is well said, but it is too abstract to show what I am really talking about. So let me describe one of my experiences with legacy code.
Early in my career I came to work on a certain project. It was version 2 that was already 4 years into development at the time. It was a MVC project covering a variety of functionalities, and there was a substantial amount of code. It had over 100 tables, and more than half of those entities needed to be able to have attachments. And that feature (attachments) is a feature I would like to cover in this section.
To speed up development, the original team used generic controllers and code generation. Among other things, they generated views for CRUD operations for most entities. These views would be generated and then altered according to requirements. Most entities needed to have attachments. To address this they have put code for handling display, upload and modifying attachments inside the view template.
Lesson: Duplicated code is hard to maintain and should be avoided
Since over 70 entities needed to have attachments, logic related to that feature was generated in over 70 places. There was a list of formats that were supported by the system for upload purposes. When a new format was required I needed to go through all those 70+ places to update business logic.
Once I did that, the new format was supported. But after some time the user complained that it does not work. It took me some time to figure out I forgot to update the view template. Next view that was generated after the change did not contain the newly supported format. It was a dumb mistake, but in my defense, I was pretty green at the time.
Lesson: If you generate code do not generate duplicated code. Or, if you do, make sure that you can later regenerate it to reflect new logic.
To be honest, it wasn’t all my fault. The view template should not have contained the view part for attachments. That code was not a candidate for code generation. It should have been a separate component that is only referenced in the view. Although it was generated code, duplicated code is still duplicated code.
If views could have been overridden by generating a new version of it, there would not be any problem. Newly generated versions would contain changes and everything would work as expected. But views were altered with manually written logic, so that was not the option.
Lesson: Generics are great. Use it when you can, but do not force it.
I decided to create components for attachments anyway. Replacing all that duplicated code might not be possible right away, but I could use it in the new code. This way I’m not creating more work for myself once I start switching things later.
Up until then, I worked on projects that used generics, but I personally have never written a line of generic code. Even though I knew what generics are, I didn’t understood their full potential.
The original idea behind the component was to have a unified view part. But as I started to use it, I recognized that every time I reference it in view I’m duplicating code in the controller too. It might be a good idea to push that fetching logic inside the component then.
This was an eye-opening realization for me. With help of reflection and generics, I managed to create an encapsulated component that handled fetching, displaying, and authorizing action on attachments.
But some pages needed to have a specific list of attachments that are not related just to a specific entity. I tried to push this into the component, and it became messy fast. So I decided to split the view and logic part of the component, and use only the view part. The business part was manually written every time because it did not fit into generic logic.
Lesson: Know the tools you are using and what your code is doing under the hood
At some point, users started to complain about the performance of a certain page. Again, to speed up initial development, the original development team decided to fetch all data on page load. Paging, filtering, and ordering was done on the client side.
With time the number of rows grew by over 5000. By itself, it was not the problem. What really killed performance was attachments.
It looked like someone concluded (just like I did when I was extracting the component) that there is a lot of code duplication when fetching attachments.
To prevent this duplication, eager loading was turned on. For each entity, ORM would load all of its attachments by default. Relations between entity and attachments were one-to-many, so ORM could not fetch those attachments within the same query. This meant that for each entry (row) ORM would make an additional request to the database.
Since the original request needed to fetch two entities to display the data in the view, the result for 5000 rows was over 10 000 requests to the database on page load. And attachments were not even used on this page.
I couldn’t simply turn off eager loading because it was used in too many places, and I couldn’t be 100% sure that I covered them all. At the end, “the solution” was to go through all pages that display lists, and manually turn off eager loading for attachments.
Maintaining legacy code is often depressing. Keeping things running while dealing with a bunch of other people’s (or yours) bad decisions get frustrating fast. Every mail, call, or message we receive means that something is not working and we need to fix it. It is hard not to get affected by that.
But working with legacy code does not need to be a nightmare. We can look at it as an opportunity to sharpen our skills. Every code is the final product of a chain of decisions made to solve some problem. By knowing when, why and how things went the wrong way, we can learn a great deal about software development. And just maybe the code that we left behind will be better than the code that we inherited.