Spaced repetition scheduling and categorization

20 Apr, 2026

Should I consider the subject matter of a flashcard explicitly when scheduling reviews? On the one hand, more information is often better when determining a scheduling interval; on the other hand, we want to avoid complexity and overfitting, and one might reasonably hope my category-by-category differences to be captured in my performance itself, without needing to look at categories explicitly.

I'm approaching a million flashcard reviews and have never used category data in a scheduling algorithm. In that time, I've made a lot of flashcards about movies.¹ Here is my performance in two common response-pattern situations, split between questions about movies and questions not about movies.

First, when I make a card and get the first review correct, here's my performance on the second review:

Screenshot 2026-04-20 at 11

And when I get my first two reviews correct, here's my performance on the third review:

Screenshot 2026-04-20 at 11

Some notes and caveats:

These are just some high-volume patterns I checked; if you think there's some other category-specific analysis I ought to do, I'd be grateful to know it.
I don't organize my studying by "decks;" I study my cards all together, and those cards are tagged by category.²
Not all my movie-related cards are tagged as such, and a lot of my tagging has been done for me by AI (but that's another post). So these data are not even close to perfect. That said: (i) I've manually inspected a lot of LLM-generated tags, and they look good, and (ii) insofar as I have movie cards that are still untagged, my non-movie / movie gap would be understated.
As with all my analyses, I'm working with self-experimental data on which all sorts of hidden mechanisms might be operating. So, for example, I might have made a lot of flashcards about movies when I was scheduling reviews somewhat differently or when I was systematically sleep-deprived. I really doubt that any effect like this is operative here, but these are definitely one person's idiosyncratic data.
It's not clear to me what, if anything, I should do about this. The goal of scheduling is not, ultimately, to predict how likely I am to remember a card; it is to learn things durably. I'd only want to schedule movie-card reviews more conservatively if I were confident it would get me to long-run retention more efficiently. I suspect it would, but I can't be sure. And, as with long-interval performance, I'm not quite sure even what questions to be asking.

All the trivia competitions I care about have questions about movies, and it's a weak area for me. It's also a subject that's amenable to study in that (i) there's a core of high-value information to learn (e.g., Best Picture winners) and (ii) it's not hard to organize, find, and present the information.↩
Once in a while I use a feature that lets me filter cards by tag while studying, but I much prefer just to study everything together. It's more pleasant; it helps me break up clusters of related questions; and it helps keep me from guessing the answer from the category of a question.↩

#self-experimentation #spaced repetition