Do you need a database and an API?
I would like to present a disclaimer as I am about to launch the kind of talks that drive people away angry. I don’t want to turn software architecture on its head. Today’s projects are not only complex but usually constrained by unfit tooling, unreasonable deadlines, and misunderstood requirements. What I will try to point out here may only be a theoretical exercise. You take what you feel is valuable and leave the rest in the theoretical exercises bin.
The discussion gets academic very fast. You work with your favorite platform and developing language, you work with client requirements and you work with what the company decided to invest in. You have a given set of libraries which allow you to do all sorts of tasks. You probably use a database and some services, and everything you do, every task you work on relies on these dependencies and constraints. If it’s C# it’s also probably Entity Framework, SQL Server, Azure, the .NET API and so on and you are able to do every task in this sandbox. Life as a software developer is good, predictable: it works.
But behind this little sandbox there is a monstrous creature that you forgot about long time ago. You used to tame it maybe in your early years, but the art is forgotten and its paths are hidden. Now you rely on platforms, services, the cloud of course, databases, lambda functions: everything is scalable just like your client said it wants it, everything is testable just like the architect said it should be, everything is using databases just like the administrator said it’s supposed to. But sometimes, a file solves a whole problem bypassing all frameworks, all APIs, all cloud infrastructure. Beyond your application layers there is and always be the operating system.
I started to acknowledge the operating system some time ago, at my previous job. We needed to know when an application was ready to be used after deployment. I don’t remember the context: some cross team requirements. The architect decided quickly to mark the successful deployment with a log in the database and then create an endpoint in the API to query it. Easy, if it weren’t for the ugly back-end. I never liked adding stuff to it. The same with the database.
I thought a bit and offered a simpler solution: if the deploy was good and the application was ready to work, let’s make the build process add another file in the deployment folder as a sign of success. The API could just check for the file’s existence and answer if it’s there or not. The architect pondered a bit and offered an improvement: no need for a new file, the build process already created something in the deployment folder that could be used as a ready-to-use mark. He was happy as the solution was of lesser complexity and I was happy too: it was much easier to implement.
The solution was also sturdier. Creating files is a low level operating system operation. If it fails, you know for sure something went horribly wrong and if you tie this operation to an application deployment, it is clear the deploy failed. Setting the deploy as completed in a database works a bit differently. The deploy might have succeeded but the database could not be reached. The deploy might have gone wrong but the database didn’t update correctly. The communication between the application server and the database server may have failed. Many things can go wrong.
Checking for a file in a folder is a fundamental act of any operating system. It’s also very fast as it deals with file system meta-data: it doesn’t even require an actual file read. Not so for databases. Not to mention databases will probably require an index built for fast scans which again, use extra resources.
A friend was working on a quiz app. He had complex quizzes with hierarchical questions and dependencies. The initial architecture wasn’t challenged: the questions were going to the database. Again I suggested him to use text files as an alternative: they would be easy to edit by anybody, no API needed, it’s faster to deploy and build. His final solution was even better: markdown files.
But let’s think about the management UI for a moment. The questions are in text files. Every operating system comes with a text editor already. Do you really need to build APIs and an extra UI for adding and removing questions? Do you need a custom UI to create a new quiz from existing questions? Yes and no of course, depending on the editor and on the features you need but at best you may need nothing: a text editor may be enough. Courtesy of the operating system. Not to mention for markdown you have even better editors.
Let’s think about the architecture too: questions depending on other questions may be placed in a sub-folder. When the user picks a question, the dependency folder is called too and asked for completion. For validation you can have codes or you can use markdown techniques like footnotes or annotations.
Is it worth removing the database to have all this? I would gladly remove an application layer to simplify the architecture. Would it make development complicated? Well it will surely make you think and plan a lot more, but if you find solutions for all the required application features, development should be easier not harder, and the application should split responsibilities with the operating system.
I was working on an application requiring critical logging. I guess you can predict the pattern by now: the architect decided without thinking twice to use a table for logging. I started asking questions: what if the table is busy? What if the database is inaccessible? What if the communication is cut? Sure, the architect defended their position: the database is ACID compliant, a write is a write, we can use transactions and on and on. The reality was although everything said was true, we had no way to make sure the logs even get to the database, not to mention they are securely stored.
The implemented solution was to use the Linux system logs. We had to add a small library to the project dealing with the system log, configure it and that was it. The logging solution was so effective we decided to drop database logs altogether and log everything to the system logs. Not only this improved the application speed as logging in the database required many contexts, requests and waits, we could now browse the logs using several Linux tools dedicated to log browsing which are extremely well built and functional. Such as
lnav for example about which I wrote before.
Again, we leveraged a service offered by the operating system. We also used a dedicated tool for log browsing instead of building another UI to check the logs. The application got simpler, lost a whole log management UI and we got to learn Linux system logging and
lnav. And let me tell you, both do a much better job handling logs than we could ever do with all our frameworks and libraries.
I have more examples and maybe one day I will continue this article with a part two but for now I should stop because the point is somewhat different. I don’t want to be mistaken for saying we should stop using libraries or services provided by our programming ecosystem of choice. Or try to use files for everything. There is a time and a place for files as is for the software platform.
What I do want is to make you question the use of platforms, APIs, cloud solutions, ecosystems, databases and many more trusted and unchallenged architectural decisions we take for granted every day. And of course I also want to remind you about your friend, the operating system.
The operating system has an API, the system commands that allow you to use files, threads, resources, sockets, message queues and so on. It also has a database, the file system with file locks allowing you even to lock parts of a file: some for reading, some for writing. As a developer you need to know what helps you better in each scenario: is it the programming language, is it a library, is it the operating system? Should you write a row in a table? Would a file suffice?