Marco Arment did something awesome
Here's episode 683 of the Accidental Tech Podcast. In a long segment, Marco, who makes a podcast player, describes how he decided to support transcripts and wound up managing a fleet of Mac Minis and renting space in a data center. More than once I had to make sure it was still March (to be sure this wasn't an April Fools' joke).
I can't do justice to the whole story here, but the key elements are: Apple offers a local API that does audio transcripts; Marco experimented with it and it went well enough that he wanted to scale the process; and some combination of envelope math and personal preference made him decide to spin up a fleet of Macs instead of using some cloud-based solution.
The story is both instructive and entertaining, even if you (like me!) would never do such a thing. (In the parallel world where I had this problem, I'd be talking for hours about queues and OpenRouter and data structures and would not have thought about power-supply voltage for a single second.)
Some miscellaneous notes:
- Scale makes things different. In almost all contexts, "how do I get transcripts for podcasts?" does not lead to one's figuring out the smallest units data centers will rent out, trying to predict the future of API pricing, and so on.
- As I hinted at above, it's hard to tell what of this is driven by Marco's preferences and what by external economics. I'm skeptical that a full accounting of the situation, including future improvements in models and lower costs of the cheap ones, and also including the cost of Marco's effort, would have mandated he take this path...
- ...but he is the domain expert and I am not, and one non-obvious important feature of the situation is that a certain kind of time-stamped output is really valuable. The kind of output you need turns out to depend on the effects of dynamic ad insertion, iOS mechanisms, and so on.
- I'm amazed that it's economically rational, or even near-rational, for a podcast app to do all this to support transcripts a couple years earlier than it would otherwise be able to.
- When I think about ecosystem lock-in, I tend to think of "I to send my parents photos easily," not "I might buy 50 Mac Minis in part because I'm good at working with a certain style of API and some day I'll probably offload some of the work to people's phones and that will be easier if I just use Apple APIs from the start."
- This is yet more evidence that encapsulation is underrated: big, expensive decisions are often made for reasons of micro-level software compatibility. The better your encapsulation is, the less micro-level structure has to dictate macro-level structure.1
- So much of the economics of this situation depends on Marco's preferences, skills, and background. When we think of software being personal, we're often thinking of "someone likes this color" or "there are jokes in the loading screen." Here "personal" means "Marco has spent years and years learning everything about audio files and RSS, and also loves Apple stuff, and has the temperament to enjoy constructing this fleet of Mac Minis, and that's why I'm going to have transcripts in my podcast player soon." It's a joy to use software like this and to be part of a profession where stuff like this happens.
That's not a point about Marco; it's a point about how many times in my career I've thought "wait, they did all that instead of just set up an abstraction boundary?"↩