Thomas M. Kehrenberg

A few thoughts on my self-study for alignment research

In June, I received a grant from the LTFF for a 6-months period of self-study aimed at mastering the necessary background for AI alignment research.  The following is a summary of some of the lessons I learned.  I have tried to keep it short.

Basic advice

You naturally want to maximize “insights per minute” when choosing what to read.  But, don’t expect it to be obvious what the most impactful reading material is! It often takes actual focused thought to figure this out.

One shortcut is to just ask yourself what you are really curious about; based on the idea that your curiosity should track “value of information” to some degree, so it can’t be that wrong to follow your curiosity, but also, working through a textbook takes quite a bit of mental energy, so having natural curiosity to power through your study is very helpful.

If you don’t already have something you’re curious about, you can try the following technique to try to figure out what to read:

This was helpful to me, which doesn’t necessarily mean it’s helpful for you, but it’s maybe something to try.

Advice specifically for AI alignment

The above should hold for any topic; the following advice is for AI alignment research study specifically.

What I read

The following is a list of the concrete things I read.  For each one, I try to give a short review.

Other fundamental math topics that I would have still tried to read about with more time

Though, I already have a basic proficiency in most of these.  The goal would have been to deepen that understanding.

What I wrote

In addition to reading, I also did some writing.  Here are my thoughts on the things I wrote.

Conclusion

Overall, I read much less than I set out to do (though I didn’t try to adjust for the planning fallacy in my plans, so they never were realistic).  I also fell ill several times (a consequence of the rollback of COVID restrictions) which didn’t help.  Still, I definitely feel much more prepared for alignment research now.


  1. Obviously, you can’t read them all within 6 months.  I had read a lot of them already before I started my LTFF grant. ↩︎