The future of migrations
Migrations are a big part of real-world software engineering. They tend to be painful. Some of these reasons are intrinsic:
- While they're happening, you need to support two systems.
- Those systems can get out of sync, which can cause all sorts of problems.
- They often have both a software component (code that needs to change) and a data component (data that needs to move). These are often interdependent.
- You need to test that migrations have succeeded.
- When you're writing those tests, you're writing them against a system that doesn't exist yet, so it's hard to ensure that they're good tests.
- You need a rollback plan, and those can be tricky.
Some are social:
- Many migrations happen because systems aren't working well, and everything above gets harder as the quality of the source system decreases.
- Many engineers hate migrations and don't do their best work on them.
- It can be hard to get institutional support for a migration, so the projects often get rushed.
So, here are some general things we can say about migrations:
- As they take longer, they get much harder. Old and new system parts drift apart and have communication failures, and those failures cascade. The pain is more than linear with respect to time.
- They require a lot more code than they might first seem to: not just the new system code, but everything else that is needed for translation, one-time verification, ongoing testing, and new documentation.
- They mercilessly punish errors in scripting.
I conjecture, then, that generative AI will help us with migrations even more than is commonly understood. It's well-understood that AI is good at some translation work, but I'm not sure we appreciate how much AI might eliminate entire categories of work in many migrations. Suppose you go into a migration with:
- Rock-solid data-migration scripts;
- Robust, reasonable integration tests against the new system;
- A healthy test environment you've run load tests against;
- Fast, accurate scripts for testing both that the overall size of the data is correct and that randomly sampled data has been preserved;
- AI-powered browser-based testing.
In many cases, it could be reasonable to simply cut over during off-hours, accept a little downtime, and have no transitional period. What we currently think of as a migration might feel more like an ordinary dependency upgrade or code change. (For concreteness: migrating perhaps a million items from a non-relational database to a relational one, in a low- to medium-complexity system, might well be like this. A recent attempt to do exactly this is what prompted this post.)
In many cases, of course, migrations will keep on being long and thorny. AI, used well, will make some migrations a lot cleaner, but it doesn't make the underlying problems disappear. Strong engineering judgment--about the size of the data, about the guarantees the system needs to provide, about acceptable risk, about how to verify proper functioning, and much more--will be necessary.
In fact, I suspect the value of that judgment will be amplified, and that we'll see a pattern here and in many other places:
- A lot of our current first-order knowledge won't matter very much any more;
- Some of it will matter more than ever;
- The ways we categorize work will have to change;
- Strong knowledge of the fundamentals will be invaluable.