What I Learned About Running Database Migrations | by Ardy Gallego Dedase | Jul, 2022

Tips and lessons learned


I’ve worked on a good number of database migrations in the past. Not all database migrations are the same. However, there are recurring patterns I noticed.

Recently, I was part of a database migration that lasted about three months. The experience reminded me that the same lessons still apply. I’ll use this blog post to write these lessons down while they’re still fresh in my mind.

I’m not providing an exhaustive list. Some of my tips might seem obvious to seasoned developers who have been through multiple database migrations in their careers.

Technology choices

Questions to ask when making these choices:

  • What are the existing tools you can use for your application so I don’t have to write from scratch?
  • What’s the scale of the migration?
  • Is it a one-off migration? Could the migration extend for a few more months, i.e., migrations with a long tail?

Build your migration tool from scratch. In the early days, before cloud services were popular, I would run a database migration script from a remote server. I ran it while hoping there will be no long outages and while maintaining data integrity. Ad hoc scripts worked well for one-off and small-scale migrations. I’d write a script from scratch if my application’s toolbox doesn’t come with free database migration tools.

Out-of-the-box migration tools. To write a tool from scratch is not the best approach if you have existing tools to reuse. Make use of out-of-the-box tools if they’re available. Some open source frameworks come with their migration tools. I have used Django and Flask’s migration tools in the past. These migration tools have worked well for small to medium-scale database migrations.

Open source tools usually come with good documentation and an active community. You could check the documentation or the community discussions if you’re stuck with issues that someone else has encountered before.

Workflows. I recently used Cadence workflows with my team to run a database migration. The decision to use Cadence was straightforward because our engineering organization widely supports its usage. Cadence also comes with advanced features like handling retries and timeouts, along with many others. Cadence’s features make it easy to write robust migration workflows. Workflows are also flexible enough to handle the migrations that we need across different databases, including Google Spanner and Firestore.

Provide a dry-run option when running your migration script. Using the dry-run option is useful to debug your migration code’s logic in your local development environment.

The command will look like this:

run-migration --dry-run=true

It allows you to preview what the migration script will do without executing the actual database writes. e.g. Print out the SQL update query you’re expecting to run, then skip the SQL update query execution when the dry-run option is true.

A simple example:

logger.Info("running query", query)if !dryRun {

Provide the option to batch the number of rows that you want to process in your migration.

run-migration --limit=100

The limit option is useful for:

  • Testing. You can pick a manageable number of rows to test with.
  • Batching. You can run your migrations in smaller batches, confirm results for each batch, and repeat the process. Small batches help reduce the surface area of any potential issues that might occur.

Logging, monitoring, and alerting

You can go crazy with your logs when testing your migration. However, log only the needed information when you run the migration on production. Migrations could involve thousands of rows. Bloating your logging dashboard with redundant information will make it difficult for you to troubleshoot if there’s an issue.

Revisit your monitoring dashboards, metrics, and alerting criteria. You should easily find out potential issues caused by the migration when you fine-tune your monitoring and alerting tools in advance.

Sandbox or test environment

Test your migration in a sandbox environment that closely resembles your production environment. No matter how much time you spend testing locally, you’ll most likely discover issues you didn’t catch in your local environment. Like handling failed API calls or unexpected values.

Front-load requests or dependencies

If your organization requires, request what you need from your DevOps or Infrastructure team ahead of time. e.g. Creating a sandbox database instance. You don’t want to get blocked in starting your migration because of dependencies with other teams.

Test plan

Prepare the test cases that will help you confirm that the migration didn’t break your database’s integrity or introduce bugs. e.g. Test if the list you display on the web application is still loading as expected or if there are no duplicate entries that will cause an API error.

Add buffer when estimating your tasks

Always factor in Murphy’s law when executing database migrations regardless of the migration’s scale. When estimating points for user stories or tasks related to migration, add a buffer for unexpected issues and/or the worst-case scenario.

The long tail of a migration

A migration’s long tail affects your team’s capacity, so you should address them in advance. Set expectations with your team and stakeholders whether you expect that the migration could extend longer than a quarter. Write the tasks needed to tackle the long tail of the migration. Follow up on these tasks during your sprint planning and make sure they don’t fall through the cracks.

One-off migrations

For one-off migrations, you have the option to get your entire team to focus on working on the migration in a couple of sprints. Getting your entire team involved works well if you can break down the migration tasks in such a way that your team can work on them in parallel.

Suppose a one-off migration is not doable. You could break down the migration into smaller stories and tasks that a couple of developers will work on every sprint for the entire quarter.

No need to be super nitpicky in code reviews

While I still encourage good coding practices. There’s a high possibility that your migration code will be relatively short-lived. If you’re raising the PRs, point out which files are temporary that you will eventually remove. Add a TODO or create a follow-up task in your backlog. If you are reviewing, use your judgement with empathy while knowing which piece of code is short-lived. It’s a balancing act between proper coding standards and what makes sense in your situation.

Write user stories

Something that’s often neglected is the importance of communicating the value of a database migration through user stories. For example, you received user complaints about how slow a legacy web application has been loading. After discussing with your team, you decided to run a migration where you create a new index to speed up your table queries.

Instead of writing:

Create a new index X in table Y…

You could write:

As a user, I want to load my list of documents within n milliseconds…”

Write your migration user stories in a way that when a stakeholder is looking at your backlog, they will understand why you are prioritizing such database migration work.

News Credit

%d bloggers like this: