Nate Meyvis

More experiments in monitoring Claude

If AI adoption is a game, one important thread of it is racing to enable --dangerously-skip-permissions when you want that ability. As Simon Willison says, "Claude Code running in this mode genuinely feels like a completely different product from regular, default Claude Code."

The problem is, of course, security; --dangerously-skip-permissions is not just a clever name. I won't attempt to review the state of sandboxing, settings management, and AI security except to make a few simple points:

  1. There's a tradeoff between security and letting the AI do things;
  2. Sandboxing is a fundamental technique here, and there's a lot to know about it;
  3. I still think that parts of the system other than settings management and sandboxing are not getting quite enough attention.

More specifically, many users are so focused on preventing security lapses that they're not thinking about detecting them. In order to improve my detection and remediation, I'm doing some experiments in logging and observability, which also bring the benefits that they do in any other software system.

Today's experiment is setting up a hook so that Claude logs all of its tool calls. An independent process can then tail the log and raise a flag if anything looks suspicious.

The technique:

  1. Create a log file somewhere.
  2. Create a file for the hook at ~/.claude/hooks/log-all-tools.sh.
  3. In the script, append the output of cat, the time, and the first few characters of the session UUID to the file.
  4. Make the script executable.
  5. Add the script to PreToolUse hooks in ~/.claude/settings.json.
  6. Make the file user-immutable with uchg.
  7. Make the settings.json file user-immutable also.
  8. Use some other tool to tail or analyze the log file.

The technique has benefits:

  1. It's simple.
  2. It's fast.
  3. If I'm not mistaken, it's pretty tough for Claude to go rogue and either ignore the setting or to do anything about a file that's been locked down with uchg.
  4. Its operation is independent of Claude's other settings, so you can combine it with any number of other security and observability techniques.
  5. What you do with the log file is independent of how you set it up, so you can change your approach situationally without doing a lot of legwork.
  6. In many situations, quick detection is almost as good as prevention (if, for example, you can deactivate a leaked API key immediately).

...but also some pretty serious limitations:

  1. For some security breaches, detecting them quickly doesn't help much.
  2. You'll need to periodically flush the log file, use a sequence of log files, or otherwise manage the growth of the log. Happily, this is a well-studied subject; less happily, it does take some effort.
  3. You might be tempted to use another AI tool to monitor Claude, but sometimes they'll refuse: the gemini CLI, for example, will happily look out for wget and curl but refuses to just "monitor for security breaches."

I should be clear: however much I like observability, I am not a security specialist. My most relevant experience here is that of being monitored, creatively and aggressively and in ways I'm sure go far beyond my knowledge, as an employee. If this technique is obviously silly for some reason I don't discuss here, I'd be grateful to know it.

P.S.: Chatting with AI about what could go wrong with this technique and how to improve it is an amusing way to learn a bit about security. It hadn't occurred to me to put a fake, honeypot API key into my environment variables and look for its being leaked!

#Claude #generative AI #psychology of software #security #software