Nate Meyvis

First thoughts about token limits

Bloomberg reports that Uber is capping employees' token spending at $1,500 per tool. Here is Simon Willison commenting. He notes that his own usage, were he not on a subsidized individual plan, would come out to approximately $1,000 per month on each of two tools.

I am neither Simon Willison nor an engineer at Uber, but $2,000 of tokens per month sounds broadly correct for someone who is a full-time, AI-forward engineer. My own usage has a lot to do with how often given tasks let me /clear, and some of my planning / thinking time these days is using zero tokens, but that certainly seems like the right order of magnitude for doing a certain style of work (see below) day in and day out.

Simon (reasonably) focuses on token costs as they compare to salary costs,1 but I'm tempted to think in a different direction:

There's a limit to how many tokens you can use in a certain kind of programming, where you're juggling a bunch of terminal windows and are thinking about what's going on in each of them. If you're /clearing anywhere near as often as you should be, you're constrained by some combination of your attention, your thinking speed, the tools' speed, typing speed, and so on. I'm sure there are many Uber engineers with more stamina than I have (it's important!), who understand their domains more fluently, and so on. But there's a certain kind of modern engineering work (call it "type-I work") where the programmer is directly or semi-directly managing a set of tasks. That kind of work has bandwidth limits, and I strongly suspect that the Uber policy will not throttle you in it, even if you are very hard-working and very AI-forward.

Not all work, of course, is of the sort I just described. You might launch massively parallel testing jobs, large research projects, iterative data-management or -analysis sessions, and other things that have effectively unbounded scope. These projects (call them "type-II work") might be valuable, but it makes sense to apply different criteria to them.

That's why I think of the Uber policy in terms of what kind of AI-first work you're allowed to do unilaterally. I strongly suspect that, if an Uber engineer has a good idea for a long-running LLM-based integration-testing scheme (or whatever), there are ways to get it approved and implemented. We might, then, approximate the policy as "do all the type-I work you want, and experiment with type-II work, but type-II work needs to become an approved, team-level project before it gets too big."

Two final notes:

  1. These are, again, just a practitioner's first, speculative reactions. I know a lot about some big companies, but not Uber. I'm eager to read more about policies like these.
  2. Many people will say that companies should trust their engineers and that many companies have given programmers effectively unlimited discretion in using computational resources. That's fair enough, but if there's ever a time to consider a maximum, it's when (i) nobody has more than a few months' experience with state-of-the-art tools and (ii) there's a(n arguably overheated) culture of using the tools as heavily as possible.

  1. But he doesn't account for payroll taxes, insurance, benefits, and so on. The ratio between the maximum token cost and the fully-loaded per-engineer cost is a lot lower than the 11% he gives, which is the ratio between the yearly maximum and the listed median salary.

#generative AI #nuts and bolts #sociology of software #software