A few thoughts on my self-study for alignment research
In June, I received a grant from the LTFF for a 6-months period of self-study aimed at mastering the necessary background for AI alignment research. The following is a summary of some of the lessons I learned. I have tried to keep it short.
Basic advice #
You naturally want to maximize “insights per minute” when choosing what to read. But, don’t expect it to be obvious what the most impactful reading material is! It often takes actual focused thought to figure this out.
One shortcut is to just ask yourself what you are really curious about; based on the idea that your curiosity should track “value of information” to some degree, so it can’t be that wrong to follow your curiosity, but also, working through a textbook takes quite a bit of mental energy, so having natural curiosity to power through your study is very helpful.
If you don’t already have something you’re curious about, you can try the following technique to try to figure out what to read:
- First, list all the things you could potentially read.
- This step includes looking at recommendation lists from other people. (See below for two possible lists.)
- For each thing on the list, write down how you feel about maybe reading that.
- Be honest with yourself.
- Try to think of concrete reasons that are shaping your judgment.
- Then, look back over the list.
- Hopefully, it should be easier to decide now what to read.
This was helpful to me, which doesn’t necessarily mean it’s helpful for you, but it’s maybe something to try.
Advice specifically for AI alignment #
The above should hold for any topic; the following advice is for AI alignment research study specifically.
- I think you basically can’t go wrong with reading all of (or maybe, the to-you-most-interesting 80% of) the AI alignment articles on Arbital. I found this to be the most effective way to rapidly acquire a basic understanding of the difficulty.
- In terms of fundamental math, I just picked topics that sounded interesting to me from the MIRI research guide and John Wentworth’s study guide. [1]
- It’s probably also a good idea to read some concrete results from alignment research, if only to inform you about what kind of math is required. I think Risks from Learned Optimization in Advanced Machine Learning Systems is one good option. I don’t know of a good list of other results.
What I read #
The following is a list of the concrete things I read. For each one, I try to give a short review.
- An essentially complete class of admissible decision functions
- An old paper on expected utility maximization.
- I don’t regret reading it, but it’s probably not that interesting for most.
- VNM rationality
- I recommend studying this to everyone.
- Kolmogorov axioms
- It is said that “Bayesians prefer Cox’s theorem for the formalization of probability”, but I think knowing Kolmogorov’s classical probability axioms is also important.
- Full recommendation from me, though reading Probability Theory by E.T. Jeynes (which I had done previously) is even more important.
- Type theory (specifically, MLTT)
- The MIRI research guide recommends “Lambda-Calculus and Combinators” for type theory, but that book is mostly focused on lambda calculus (and is a bit difficult to read, in my opinion).
- Homotopy Type Theory (HoTT) is about dependent type theory and is quite a nice read. I really liked reading about type theory and also wrote an introduction for it, but it’s probably not really necessary for AI alignment research.
- In order to learn about pure lambda calculus, though (like Church numerals and Y combinators), HoTT is not the right book. I don’t really know of a good book for that.
- Fixed point exercises
- They were harder than I anticipated and I don’t yet really see how they’re relevant to AI alignment, but I trust Scott Garrabrant to know what he’s talking about, so I suppose I would recommend them.
- I think I was too reluctant to look at hints.
- Topology
- Topology is the one big field of mathematics that I previously knew basically nothing about, so I tried to address this.
- The standard textbook, Topology by Munkres, could not really hold my interest though.
- I found Topology: A Categorical Approach more interesting, but I only read a bit before deciding that other things were more useful to read.
- AI alignment articles on Arbital
- As I mentioned, this receives my highest recommendation.
- Late 2021 MIRI conversations
- There is a lot there and I couldn’t finish it in time.
- I think these will make a lot more sense after you’ve read all the AI alignment articles on Arbital. Or maybe read the two in parallel: any time you want to know more about something that came up in one of the conversations, look it up on Arbital.
Other fundamental math topics that I would have still tried to read about with more time #
- Category theory
- Model theory
- Proof theory
- Game theory
- Optimization theory
Though, I already have a basic proficiency in most of these. The goal would have been to deepen that understanding.
What I wrote #
In addition to reading, I also did some writing. Here are my thoughts on the things I wrote.
- What’s so dangerous about AI anyway?
- I think it was a useful exercise to write this, but the people who were my target audience probably will not read it.
- So the effort spent on editing the article was probably not worth it.
- Introduction to Dependent Type Theory
- Writing this definitely deepened my understanding of the topic.
- Editing took again a majority of the time spent on the article.
Conclusion #
Overall, I read much less than I set out to do (though I didn’t try to adjust for the planning fallacy in my plans, so they never were realistic). I also fell ill several times (a consequence of the rollback of COVID restrictions) which didn’t help. Still, I definitely feel much more prepared for alignment research now.
Obviously, you can’t read them all within 6 months. I had read a lot of them already before I started my LTFF grant. ↩︎