On having a data object

27 Oct, 2025

Here's a common pattern:

Carve out a chunk of your persistence layer--e.g., everything storing information about hats.
Create a class for managing interactions with that chunk of the persistence layer.
Use that class whenever you need to interact with that part of the persistence layer.

People love this pattern. Everyone likes the (apparent) simplicity of plain old objects, and many engineers also think that those objects bring abstraction benefits.[1]

Django and other frameworks enforce this pattern as a basic design principle. It's a motivation for the many ORMs out there. And almost any big company has a standard mechanism for defining and generating classes of this sort.

This data-object pattern has so much infrastructure and so much consensus supporting it that it can seem like the only reasonable option. But often it's not, and you would be better off using an alternative, for a few reasons:

1. You should often be using different objects in different contexts.

Parts of the codebase that initially seem to require identical objects--e.g., because those objects both concern the same real-world item--often require subtly different objects. But the data-object pattern requires you to use the same object for both of them.

The domain-driven design ("DDD") way to say this is: bounded contexts require different models. But accepting this point requires no ideological commitments. A Hat object can properly include a representation of a SKU in some contexts but not others; a representation of a discount or price in some contexts but not others; CAD artifacts in some contexts but not others; and so on.

I've never seen a good way to use the same object across contexts. You can mark the SKU field as Optional, but the semantics are wrong: Usually, it's never actually optional, but either required or irrelevant, so you have to write extra code for validation. (Also, these objects tend to be hard to test for the same reasons decorated functions are, and it can be tricky to work around this.)

Meanwhile, if you have different Hat objects in your hat-ordering and hat-creation modules, those objects can behave exactly as those contexts require. Those objects might look similar, and parts of them might feel untidily repetitive, but the benefits of accuracy usually outweigh those of consolidation.

2. The pattern pushes you to treat different access patterns as the same.

Semantically analogous operations ("what are all the orders for this hat SKU?") can require importantly different implementations. Sometimes you need strongly consistent reads, and sometimes you don't; sometimes you'll be planning to filter or query the result further; and so on.

You certainly can make your orders_for_hat() function accept various flags and parameters to satisfy the requirements of its various callers. But this tends to (i) be messy and (ii) break encapsulation. Very often, you're still implementing multiple units of functionality--precisely what you were trying to avoid!--but in a clunkier, more bug-prone way.

3. The classes get huge and painful.

The most important objects you deal with--e.g., the Hat object in hat-management software--will wind up with tons of code.

That's not all bad. As a system matures, it needs to keep track of a lot of things; the real world is messy. Huge Hat classes are, in part, a sign that you've figured out a lot of the little details you need to represent. But if all those little details are part of a single big class you're importing everywhere, you're compounding necessary difficulties with unnecessary ones. As I've argued, you'll be telling yourself lies with your type system; making your methods know too much about their callers (and, often, your callers know too much about the methods); and making it all hard to test.

This makes for a bad time. I've worked in many code bases where the central persistence-facing objects were huge (in one memorable case, well over 2,000 lines). I've never seen it go well.

4. You get an extra point of failure.

The promise of the data-object pattern is to replace N modules with 1. But if the N things you're "replacing" are actually necessary, and you'll be implementing more or less damaged versions of those N things whether or not you use the data-object pattern, then you're going not from N to 1 but from N to N + 1.

Moreover, that "+ 1" tends to punch above its weight. So, for example, if the object is being automatically generated according to some DSL in some build process, a lot can go wrong. But that's a different post.

For now, remember that the data-object pattern is neither a law of nature nor a universal best practice. A clean, module-specific persistence-management layer is often the best available tool.

[1] Not all the data objects you'll see with this pattern are "plain old" objects according to the Fowler et al. sense of "POJO." The point is simply that people want to call get_hats() more than they want to look at SQL or a DynamoDB query. (Also, even senior-level colleagues will advocate for this pattern because it "gives us a POJO," whether or not that's formally accurate.)