The Traditional One vs. the Modern One
Did you recently start developing or porting an application to Microsoft Azure? If so, one of the things you probably have to decide upon is the database solution you want to use. The Microsoft cloud offers multiple options, with the two most popular being Azure SQL Database (and its two siblings, Azure SQL Managed Instances and SQL Server on Azure VMs) and Azure Cosmos DB.
Recently, I was comparing these two alternatives for my application. I gained some insights I would love to share with my readers. First, let us look at both products separately and see which features they offer.
Azure SQL is Microsoft’s SQL Server offering on Azure. There are three services related to Azure SQL to choose from:
- Azure SQL Database
- Azure SQL Managed Instance
- SQL Server on Azure VMs
Azure SQL Database is the most “managed” service out of the three. It is a PaaS offering enabling you to set up SQL Server quickly and effortlessly. Most of the administration tasks, including backups, updates, and monitoring are taken care of for you. You also don’t need to manage the underlying infrastructure.
There are multiple service tiers to choose from, including a single database option with configurable memory and compute, an elastic pool option that enables you to share resources between multiple databases as well as a serverless option that can scale your resources dynamically depending on use.
SQL Server on Azure VMs simply enables you to run an SQL Server instance on your own provisioned Azure virtual machine. This means you get almost full control over every aspect of the deployment. This also means you will need to care about most of the administration tasks you are used to performing on your on-premises solution.
Azure SQL Managed Instance is a sort of hybrid between these two. On one hand, it enables you to seamlessly transfer your on-premises SQL Server database to the cloud. On the other hand, it gives you many advantages of the cloud, including improved security, automatic patching, version updates, and automated backups.
Since Azure SQL is mostly compatible with SQL Server, it also offers the same transactional guarantees (ACID) as the on-premises product. Due to these strong guarantees, scaling out by adding multiple server instances is mostly limited to read-only replicas. This is also true for the geo-replication feature of Azure SQL.
Pricing for Azure SQL Database is based on the computing and memory you require (vCores) as well as the local storage and backup storage you configure. Overall, the pricing seems easy to understand.
Cosmos DB is the NoSQL offering on Azure. It is designed from the ground up to support extreme scalability, fully embracing the concept of eventual consistency. The main target applications are IoT and global-facing cloud applications. To this end, Cosmos DB offers different consistency levels, data access methods (APIs) as well as scaling options.
Since Cosmos DB is a NoSQL database, it generally conforms well with how data is used by your application. For instance, an “item” can be inserted into the database as a JSON document and the output of a query is a JSON document as well. Databases and containers are used as namespaces over the items with certain additional functions.
To support extreme horizontal scaling, Cosmos DB uses multiple mechanisms. First, Cosmos DB partitions each container according to a partition key. These partitions are automatically spread across physical nodes. Additionally, global distribution of data is achieved by deploying your data across multiple Azure regions. This decreases latency and increases responsiveness across the world. It also provides a failover during regional outages.
Understandably, global distribution comes at the expense of consistency. In contrast to SQL Server which provides strong ACID guarantees by default, Cosmos DB allows you to choose between multiple consistency levels, each providing its guarantees. The most extreme of these, eventual consistency, only guarantees that all replicas will converge at some point in the future.
The pricing model of Cosmos DB is based on request units (RU). 1RU is the cost of a 1KB point-read. Queries are correspondingly more expensive. Request units seem simple enough, but it’s hard to estimate the usage given your particular data access patterns without prior experience.
Having described both solutions, you may ask which one is most suitable for you? Let us discuss some strengths and weaknesses of both databases.
Azure SQL Database Pros
- Flexible deployment options. Whether you want a fully managed service or if you simply want to run SQL Server on Azure’s VMs, Azure SQL Database got you covered.
- Familiar with SQL developers. Since Azure SQL Database is essentially the Azure implementation of the well-known SQL Server, most skills directly transfer to the cloud offering.
- Customizability. If you run your own SQL Server instance, you have control over every setting.
- Transparent pricing. Pricing is based on the required memory, compute, and storage resources.
- Migration. Relatively easy migration of your on-premises SQL Server databases to Azure SQL due to the same underlying engine.
- Stability and reliability. Microsoft SQL Server was initially released back in 1989 and has come a long way since then. It is a mature, stable, and well-tested database engine.
Azure SQL Database Cons
- Fixed schema. Generally, SQL databases are less suited for frequently changing software requirements due to their fixed schema. Schema migration may be hard to manage. Although SQL Server can work with JSON data nowadays (thus bridging the gap between SQL and NoSQL), it requires special syntax and feels a bit like an afterthought.
- ACID guarantees and distributed transactions. While these features can be very useful for some applications, they also drastically reduce scalability in case you don’t need them.
- Limited scaling. While scaling up (by adding more memory or compute to your SQL Server instances) is a quick and easy way to ramp up performance, it may become prohibitively expensive beyond a certain point. Scaling out (by adding multiple instances of SQL Server) is mostly limited to read-only replicas.
Azure Cosmos DB Pros
- Global distribution. Many companies nowadays act worldwide. Their customers expect the same performance from applications regardless of where they are located. This makes the standard SQL database model very inefficient. Having one write-enabled master node with multiple read-only replicas is not sufficient for tasks such as social network feeds or IoT data streams.
- Multiple consistency levels. On the other hand, these applications usually do not need the consistency guarantees of the traditional databases. They can rely on weaker consistency models such as consistent prefix or even eventual consistency. Having the freedom to choose the consistency level is a major benefit of Cosmos DB.
- Auto-indexing support. Every container has an indexing policy set to speed up queries. By default every property is indexed and range indexes are enforced. This is sufficient for most use cases. However, advanced indexing settings are available.
- Multiple APIs. Cosmos DB provides its own Core API as well as multiple alternative database APIs (Gremlin API, Cassandra API, MongoDB API, Table API) that help with migration from other NoSQL databases. This lowers the migration cost and learning curve.
Azure Cosmos DB Cons
- Harder to understand for traditional DB admins. Since Cosmos DB is a NoSQL database and doesn’t necessarily provide strong ACID guarantees (depending on configuration), it is harder to understand for DB administrators used to dealing with traditional SQL databases.
- Complex migration from SQL Server. Migration from a SQL Server-based solution may prove to be very costly and in extreme cases may require rewriting major parts of the application. Subsequent optimization will then take up even more time. I would only advise doing this if the benefits of global scaling outweigh the efforts.
- Hard-to-understand pricing model. The pricing model is based on Request Units which may be unintuitive. The Cosmos DB Capacity Calculator in the portal may help with cost estimates provided you can estimate certain key metrics. In general, Cosmos DB may be quite expensive.
- Public cloud-only. The only place where Cosmos DB is deployed is in Microsoft’s datacenters. Currently, there is no offering for hybrid or on-premises cloud solutions such as Azure Arc.
Both products have a place in the Azure ecosystem. While Azure SQL Database provides easy migration from your on-premises SQL Server solution and helps you to leverage your existing SQL Server knowledge, Cosmos DB was designed from the ground up for global distribution and is fully managed.
If I had to decide which database to use for a new project, I would certainly go with Cosmos DB, as I simply love its scalability and flexibility. As a NoSQL service, interacting with the database simply feels more natural in your application code than working for complex ORM middleware required for traditional databases. Moreover, the choice of multiple consistency levels makes it very easy to choose the right balance between speed and consistency.
My main concern with Cosmos DB is its steep learning curve for administrators used to dealing with traditional SQL databases. Setting up your first Cosmos DB database feels overwhelming. You will have to read through lots of documentation or watch lots of tutorials to figure out how to set up the service to fulfill your specific requirements.
However, I think that this will improve over time as the adoption rate increases. Cosmos DB has already come a long way in a short time, and I can’t wait to see where we are headed next.