On trees and dimensionality

Here's a crisp and thorough essay from Robert Beard thinking through the problem of finding exact matches for a subtree in a larger tree. Unsurprisingly, you can help yourself by preprocessing the larger tree.

(I say "unsurprising" because there are so many graph algorithms that (i) are linear in the size of the graph, so not asymptotically more costly than simply reading the graph; (ii) straightforward to implement; and (iii) extremely informative. Tim Roughgarden calls these "for-free primitives," which is a great way to think about them.)

The striking part (to me) of Robert's discussion is the value of preprocessing the large tree by labeling each node with its height (distance from the nearest leaf). Once you do that, you know that any node with height smaller than the height of the smaller tree cannot be the root of a match. So you can lop off the bottom of the tree from consideration.

Even if you're not worried about graph algorithms or interview questions, it's worth keeping in mind that most of a tree is near the bottom (if the tree is balanced or nearly so):

  1. It's the key to seeing why a heap can be constructed in linear time;
  2. It's why one or the other of breadth-first and depth-first search can be much better than the other, even though both take linear time (sometimes solutions are more likely to be found nearer to the start);
  3. It's why searching a game tree just a little bit farther can take a whole lot longer;
  4. And I'm pretty sure it helps explain A* search is so powerful in practice.

And from here it's a step to more general facts about plane and solid geometry, which I use, on average, more than once a day in the kitchen:

  1. If carrots A and B are the same shape, but A is twice as long as B, it will yield eight times as much food (as long as you peel carefully);
  2. The cooking time of a meatball or roasted vegetable is extremely sensitive to how big the items are (so don't have much faith in listed cooking times, and if err on the side of making them small if you're in a hurry);
  3. It's a big mistake to assume that every apple in the fridge is calorically equivalent.

See also the curse of dimensionality. This all reminds me of how the Euclidean algorithm is often classified as "elementary," but roughly once a year I come to believe I've never properly understood before.

By the way, Robert is looking for work. He's a brilliant person and engineer. (Disclaimer: also a friend.)

Subscribe to Nate Meyvis

You'll get email when I post new essays and notes.