Nate Meyvis

"Harness engineering" is not new

"Harness engineering" is a new term variously defined as "the system around the model," "everything in an agent except the model," or otherwise. There's some excellent work in the "harness engineering advice" genre, but it contains no fundamentally new techniques.

It is not easy to embed a quasi-magical tool in a software system, but we've long been embedding powerful, unpredictable things in software systems: cars and traditional machine-learning models are two examples. One common surprise for new engineers at big tech companies is to learn that some generate_response() function will in fact generates a task to be completed by a human being.1 All of these need surrounding software to check unpredictable results, coordinate subsystems of different latencies, and use powerful tools to greatest advantage. This is hard to do, but it is a new variant of a well-studied hard thing. If you never took a class in "harness engineering" and don't have a specific mental model for a "software harness," please do not think you are behind. As in so many other ways, everything and nothing is new.

Here are some questions you can ask about your use of AI in order to help yourself draw on analogies:

  1. What is the expected duration of your AI invocations?
  2. What is the variance in the duration of these invocations?2
  3. What kinds of fast, deterministic checks can you run on LLM outputs?
  4. What kinds of fast, deterministic checks can you run on LLM inputs?
  5. How bad is it if an invocation doesn't get a good LLM response?
  6. How bad is it if a bad response gets into a downstream subsystem?
  7. How often will you want to change configurable, LLM-facing parts of the system?3
  8. What of this can you monitor?

The tools we're "harnessing" are new, and having AI tools to build with changes what and how we build these systems out, but we are answering the same structural questions we've always been asking.


  1. There are lots of stories of people not understanding this, running a loop that calls such a function many times, and costing their companies tens of thousands of dollars.

  2. This is important in, for example, finding the right approach to load balancing.

  3. This might just be a prompt.

#generative AI #nuts and bolts #software