Nate Meyvis

Databases make good databases

Here are two stories on a common theme from my career. They are fictionalized, but faithful to the truth.

1. Load-bearing file names

A hat retailer has periodic analytics workflows that produce .json files. They need to keep track both of when the workflow ran and whether they were using v1 or v2 of their analytics process. So they put files in their bucket-service:analytics/ folder with names like 1765927385_v1.json. Consumers list files in the folder and do logic on the filenames to figure out what is relevant to them.

This can work, but notice that this amounts to using a bucket's file name as a database.

2. A high-powered enum

You, our hat company protagonist, have an enum with all the hat styles:

class Hat(Enum):
	STETSON = "STETSON”
	FEDORA = "FEDORA”
	BERET = “BERET”

Then you decide to make the values correspond to integer IDs:

class Hat(Enum):
	STETSON = 1
	FEDORA = 2
	BERET = 3
	…

One day, you collect variant names in your system and use the aliasing feature of your favorite language’s Enum class to track and handle them:

class Hat(Enum):
	STETSON = 1
	COWBOY_HAT = 1
	FEDORA = 2
	FADORA = 2  # typo in legacy code
	BERET = 3
	...

Now you can use Hat[“TRILBY”] to figure out if “TRILBY” is the name or alias for a hat in your system.

... But the system grows, and those IDs need to correspond to the IDs in another part of your system. You wind up with comments like this in your system:

# DO NOT add anything to this enum except with this process:
1. Run `cursed_shell_script.sh`;
2. Copy and paste the UUID it emits after the success message;
3. Use that UUID as the value in this `Hat` enum;
4. Commit your changes in the same pull request.

Over time, your enum grows, more modules reference it, and you find yourself writing helper functions to reason about its contents. Instead of using a database, you are using (i) a single flat file and (ii) the mechanics of your favorite language’s Enum class.


In these cases, and others I've seen, the mechanics of a database--to a first approximation, the accurate, performant storage and retrieval of interrelated values--are supplied by non-database tools. This can work: I fondly remember the introduction of Designing Data-Intensive Applications, where Kleppmann constructs a primitive database from a flat file.

Over time and at scale, though, databases exist for a reason. All the edge cases, brittle behaviors, and performance concerns that arise in the cases above are better addressed when you're using a database to do database-like things. Whether from inertia or stubbornness, however, I find that teams often cling to non-database solutions far longer than they should.

Try not to make this mistake! Keep it simple. Databases make good databases.