Thomas M. Kehrenberg

What’s so dangerous about AI anyway?

(Or: What it means to be a superintelligence.)

You might have heard people worrying about AI as a world-ending threat, but maybe you’re not really convinced. Maybe current “AI” seems quite unimpressive to you and it’s not clear to you that it is coherent to talk about “smarter-than-human AI” at all.

“Like, okay,” you might say, “an AI might be better at calculus than us. Or, it knows more facts about the world than a human because it can store everything on its hard drive. Maybe it can also give really precise answers sometimes, like specifying a probability to 8 significant digits. But that doesn’t sound that dangerous over all? I know some smart people and I’m not particularly afraid of them. I mean sure, we shouldn’t use AI to power flying robots with lasers and then let them roam the country – that’s just common sense. And, of course, big corporations can do bad things with AI when they use it for hiring decisions or social media moderation. But, every technology has up- and downsides. I don’t see what’s so uniquely dangerous about AI.”

In order to understand what an AI might be capable of, I suspect it helps to taboo the word “intelligence” and talk about more specific cognitive capabilities instead, which we can then extrapolate into super-human regions.

We certainly don’t know how to (efficiently) implement these cognitive capabilities with current machine learning or any other approach we have, but the point is that we can talk about what these algorithms – once found – will look like from the outside; even if we currently don’t know how they would work internally.

In general, everything surrounding the concept of intelligence was historically very mysterious to people. But we are making progress! As an analogy, consider how confused we once were about computer chess: In 1833, Edgar Allen Poe wrote an essay asserting that chess was impossible for an ‘automaton’ to play (well) even in principle. Then in 1949, Claude Shannon showed a way to do it with unlimited computing power (and also an idea how to do it with limited but still very large computing power). Nowadays, Deepmind can train a super-human chess AI from scratch in less than 10 hours on a TPUv2 pod (that is, with 11.5 PetaFLOP/s; so, about 4.5 months of training on the largest supercomputer from 2002).

The point is that, at least with respect to some cognitive skills, we are now closer to the Claude-Shannon-era in understanding than the Edgar-Allen-Poe-era. We have any idea at all now about what kind of cognitive algorithms the human brain is running and so we are in a position to imagine what an agent would be like that had even better cognitive algorithms.

So, let’s consider the cognitive skill of discovering the underlying pattern behind some observations. Humans are quite good at that; as babies, we might still be surprised if things tend to fall in a certain direction but soon we have synthesized the law that all things fall to the ground. Similarly, humans have observed that when dogs reproduce, the puppies are similar to their parents. We first used that knowledge to create specific dog breeds but later, Charles Darwin distilled this and other observations into the theory of natural selection – a succinct law that explains a whole lot of phenomena in the natural world.

In both cases, there were some observations which appeared random at first – like finches differing slightly between the Galápagos islands and being remarkably well-adapted to their respective island – that could be neatly explained by a universal rule, once that rule was found.

We also use this process – which we might call induction, though philosophers might disagree with me there – constantly in everyday life. We take some observations, like: today, Alice did not hum as usual when she made tea, and she barely ate anything of the pasta she usually likes so much; and we come up with possible explanations for the underlying pattern: maybe she’s stressed from something that’s happening at work or she’s sad about some news she got from her family.

A human might generate some guesses as to the underlying pattern and then check retroactively if they can explain the observations.

Now imagine a thinking machine that can do that, but better. Such a thinking machine can take seemingly unrelated observations and draw deep conclusions about the world. Kind of like a movie detective, but in real life. It wouldn’t need hundreds of years of zoology to realize that life on Earth arose from natural selection. It would be done within a week.

The super-human thinking machine would only need to look at the micro-expressions on Alice’s face for a couple of minutes to have a pretty good guess about what is bothering her (assuming some prior knowledge of human psychology, of course). Things that look mysterious to us humans may well be due to an underlying pattern that we have been unable to discover, but which a super-human pattern recognizer could identify easily. Such a machine would almost definitely discover new laws in chemistry, biology and psychology, as those areas have not been formalized that thoroughly by human scientists, and it might even discover new laws of physics.

All this, based on the cognitive skill of induction.

(As an aside, we actually do have a formal theory of induction so this isn’t all just fancy speculation, but, well, that formal theory is based on infinite computing power.)

There is a complementary skill to this, that is heavily intertwined with the skill of identifying patterns – so much so that you probably cannot do one well without the other. And that is the skill of generating all the expected observations from the laws you have distilled (what you might call deduction).

Again, humans are pretty good at this. If you know Alice well and you hear that she recently learned her grandmother has died, then you are able to predict her behavior to some degree. You’ll be able to predict she’s sad for one, but you also might know more specific things she might do.

Physicists and engineers are masters at this for the physical world: they can take the laws of physics (as we know them) and predict whether a bridge will remain standing or will collapse.

In both cases, you take something you know, and then you derive implications of it. Like, if you know the theory of natural selection well, and you observe that a species has an equal number of males and females (as is the case for many species on Earth), then you can deduce by Fisher’s principle that males and females in this species are likely in free competition among their respective sex and will likely behave, in general, somewhat selfishly – as opposed to ants for example, who usually do not have an equal number of males and females, with most of the females not being in free competition and not acting selfishly. And you can deduce that parental investment is roughly equal for male and female offspring in that species. Thus, by knowing a law, such as Fisher’s principle, we are able to deduce many things about the world, like the fact that humans are more selfish than ants, just from knowing their sex ratio.

If we imagine a thinking machine that is super-human at this task, we can see that it can deduce all sorts of things about the world (including human psychology) that we humans haven’t thought of. Just from the theory of natural selection, it can predict a lot about the human psyche. By knowing the laws of physics and chemistry, a super-human thinking machine would be able to predict even outcomes on the nanoscale, like how proteins fold.

As we said, deduction and induction are intertwined. One way to do induction (that is, synthesizing general laws from observations) is to randomly generate possible laws (ideally starting with simple laws), then deduce all the implications of these laws and finally check which ones match observed reality. However, this random guessing is of course not computationally efficient in any way, and not how humans do it. Still, the fact that AlphaFold succeeded where humans failed – in predicting protein folding that is – is at least some evidence that humans are not unbeatable in terms of deduction.

(We also kind of have a formal theory of deduction, but again, it’s not exactly computationally efficient.)

With human-level understanding of physics and biology, we can develop amazing things like mRNA vaccines and computer chips. If we crank up the ability to model the real world, even more magical-seeming technologies become possible. But there is one more building block missing, in order to go from a super-human ability to predict the world to the ability to develop new technology. And that is the ability to plan a series of steps leading to a goal.

Given some goal, a human can develop a plan to achieve that goal, based on an internal model of the outer world. I mean ‘plan’ here in the widest sense possible – a chain of causally linked steps through time and space. It includes things like ‘planning to go buy more milk at the supermarket’. Say, you have noticed that you are out of milk, and the world model in your brain tells you that you can buy milk at a place called ‘supermarket’. But how to get there? The plan might involve these steps:

  1. Put on shoes and a jacket and pocket your car keys and your wallet.
  2. Go to your car. (Note that the car might technically be farther away from the Supermarket than your home! But you still know it’s faster to first go to your car.)
  3. Start the car with your keys and start driving. (Note that driving itself is full of complicated actions you have to take, but we’re going to gloss over that here.)
  4. Take the fastest (not necessarily the shortest) route there.
  5. Park somewhere close to the supermarket and go inside.
  6. Look for milk, take it and pay with the money in your wallet.

This is a non-trivial plan! Your cat likely would not be able to come up with it. And yet you do it so effortlessly! It takes you barely a second to think of it. (Though to be fair, your brain has likely cached this plan from a previous occasion.)

Note also, how much knowledge of the real world the plan required. You had to know that you need money to buy things from a supermarket. You had to know that driving a car is faster than walking, and that you will get into trouble if you do not follow the traffic laws of the country you’re in.

We can see that without a solid understanding of the world, the ability to develop plans (aka chains of actions) does not gain you much – at most it will lead to ineffectual flailing. And vice versa, even if an agent has the most amazing internal world model, it will not affect the world much if the agent does not develop plans and does not take action. But combined, this is very powerful.

Can you now imagine a thinking machine with a super-human ability to plan? It may help to picture how it could roughly work: it could simulate the world according to its world model (that it acquired through induction on observations and deduction from universal laws) and try out plans in this simulation to check whether they lead to the desired outcome.

One way to visualize this, would be to imagine it as if the thinking machine had developed a virtual reality version of the real world, and then in this virtual reality it could try out plan after plan to see what the predicted outcome is. (Of course, in a real AI, it wouldn’t really work like this, because this is computationally very inefficient, but this is to show that with enough computing power, it is definitely possible to be super-human at planning.) The thinking machine could go through millions of plans per second to identify the best one – the one that is the most robust, has the highest chance of success, and leads to the desired goal. And assuming the world model was (more or less) accurate, the plan will work.

From the outside perspective, it will look like magic! To you, it might seem like the thinking machine is taking random actions – just like your cat does not understand how you got food after you got into that metal box with wheels – but in the end, the goal will be achieved. The only way to prevent this is to lock the thinking machine into a box and never let it interact with the outside world at all. (Not even via text messages!) If it can’t take any actions, it really can’t affect the world, but then it’s also useless.

Another analogy might be video game speed running. In a speedrun, the gamer knows the game mechanics so well that they can exploit it far beyond what the game developers anticipated in order to achieve victory much faster than intended. You could imagine a superintelligence speedrunning our world by trying to identify all the exploits that are present in human technology, human psychology and the physical world, in a way that we humans just cannot predict, because the thinking machine is simply better at finding exploits.

Some of the intermediate steps in a plan might look familiar to humans. If you give a super-human planner the goal of ‘mine some helium-3 on the moon’, then its search for plans/chains of actions will likely conclude that it is a good idea to build a rocket. To that end, it might need to learn more about rocketry and manufacturing technology, so it will develop a sub-plan to get additional information, and so on. (I’m skipping over some of the technical problems here, like, “how to estimate the value of the information that a textbook on rockets contains before you read it?”) But humans will not be able to predict all the steps, because otherwise we would be as good planners as the machine, and then it isn’t super-human anymore.

I don’t know of a fundamental law for why super-human planning isn’t possible. It’s just scaling up a cognitive algorithms that we are running in our brains. And with all the computing power we have nowadays, this seems very much possible.

At this point, you might wonder whether humans have any other cognitive skills beyond what I described here – some other secret sauce. What about creativity? Wisdom? The human spirit? – Those things still exist of course, but it seems to me that you can emulate these more specific cognitive capabilities with the more general-purpose algorithms I described above. I think it should not be a completely crazy claim by now that an AI that has modeled humans in sufficient detail will be able to replicate human creativity. Machine learning models like DALL-E are trained on many, many examples of human creativity and have distilled at least a small aspect of the rules that underpin it. And to do that, it has used general purpose search techniques like gradient descent. This shows that general optimization techniques can at least approximate some aspects of human thinking. It stands to reason that in the future, machine learning algorithms will be able to emulate even more of it. Indeed, deep and general optimization algorithms have shown themselves to be the most capable time and time again in the history of AI research.

For the skills I have described, you may also see an underlying general-purpose algorithm, a kind of efficient search: efficiently searching for universal rules that explain observations, efficiently searching for valid implications, efficiently searching for chains of actions. As mentioned, gradient descent is one search algorithm – more efficient than evolutionary algorithms, but restricted to finding real-valued vectors that minimize a differentiable objective. It’s likely that the human brain shares search-related machinery among the three skills and an AI could too, such that an improvement in its general-purpose search algorithm would affect all of its capabilities.

To recap: a super-human thinking machine will, compared to us, be able to infer more of the underlying patterns from fewer observations; it will be able to more accurately predict the future based on all the knowledge it has gathered; it will be able to use that to make more complicated and much more effective plans to achieve its goals/optimization targets. If a machine superintelligence, that has, say, access to the Internet, is trying to achieve some goal that you do not agree with, there is basically nothing you can do to stop it – if it has already developed a sufficiently detailed world model. Everything you might try, the AI has already predicted and has developed a contingency plan for. You have no hope of fighting it and in fact, if it came to it, you would be dead before you even knew you were in a fight to begin with, because that is just the easiest way to win a conflict.

You can try to be clever and do something ‘unpredictable’ to throw off the superintelligence, and while that may work in terms of making it more uncertain about your next move, the truth is that you do not have the deep understanding of the physical world and the delicate planning ability to make dangerous moves, from the perspective of the AI – at least if you had let it run unsupervised for long enough for it to copy itself all over the Internet. It doesn’t matter how random you are when the nano bots administer the botulinum toxin to everyone in the world at once – if the AI happens to want to rid itself of human interference.

There is another lesson here, which is that the human level of cognition does not seem like a natural barrier on the way to even better cognition, or, put differently, if you build an AI with human-level “intelligence”, then it will not be difficult at all to make it even smarter – in the easiest case, just give it more GPUs to run on. That is, you might think that if we ever cracked the secret behind intelligence, then we would get AIs that are roughly human-level in terms of cognitive capabilities. But the human level of induction/deduction/planning simply isn’t anything special and there is no reason to think that AIs will end up around that level. The fastest humans can run at about 40km/h and with human anatomy it’s hard to get past that speed, but if you have invented wheels and a combustion engine, then 40km/h is not a special level of speed that is hard to break – indeed, cars can go much faster than that. Of course, cars have new limits, imposed by things like air resistance and overheating tires, but when considering those limits, the human limit is almost completely irrelevant.

Similarly, a machine intelligence likely has very different limits than humans when it comes to induction, deduction and planning. Not least because the size of human brains is limited by our cranial capacity whereas a machine intelligence can grab all the computing hardware lying around to improve its capabilities.

All this means that there likely will not be much time[1] between someone’s discovery of how to train a machine learning model to gain the cognitive capabilities mentioned above, and the moment when a datacenter somewhere is hosting a machine intelligence that is much more capable than us, at which point it is probably too late to change its goals. We will not be able to experiment on roughly-human-level AIs – in order to carefully tune and align them and to set up a lawful AI society – before we get to super-human AI. Most likely, we will get one shot to ensure the first AI’s optimization targets align with ours. And then it’s out of our hands.


  1. Maybe a year between those two points in time? ↩︎