Our world will not be saved by those who talk. It will be saved by those who roll up their sleeves and get to work (and by those who support them). Luke Muehlhauser
I generally open new books and articles about AI risk with some trepidation. Usually, people who write about these issues for a popular audience show little familiarity with the scholarly literature on the subject. Instead, they cycle through a tired list of tropes from science fiction; for example, that robots will angrily rebel against their human masters. That idea makes for some exciting movies, but it’s poor technological forecasting.
I was relieved, then, to see that Barrat has read the literature and interviewed the relevant experts.
As I see things, the key points Barrat argues for are these:
- Intelligence explosion this century (chs. 1, 2, 7, 11). We’ve already created machines that are better than humans at chess and many other tasks. At some point, probably this century, we’ll create machines that are as skilled at AI research as humans are. At that point, they will be able to improve their own capabilities very quickly. (Imagine 10,000 Geoff Hintons doing AI research around the clock, without any need to rest, write grants, or do anything else.) These machines will thus jump from roughly human-level general intelligence to vastly superhuman general intelligence in a matter of days, weeks or years (it’s hard to predict the exact rate of self-improvement). Scholarly references: Chalmers (2010); Muehlhauser & Salamon (2013); Muehlhauser (2013); Yudkowsky (2013).
- The power of superintelligence (chs. 1, 2, 8). Humans steer the future not because we’re the strongest or fastest but because we’re the smartest. Once machines are smarter than we are, they will be steering the future rather than us. We can’t constrain a superintelligence indefinitely: that would be like chimps trying to keep humans in a bamboo cage. In the end, if vastly smarter beings have different goals than you do, you’ve already lost. Scholarly references: Legg (2008); Yudkowsky (2008); Sotala (2012).
- Superintelligence does not imply benevolence (ch. 4). In AI, “intelligence” just means something like “the ability to efficiently achieve one’s goals in a variety of complex and novel environments.” Hence, intelligence can be applied to just about any set of goals: to play chess, to drive a car, to make money on the stock market, to calculate digits of pi, or anything else. Therefore, by default a machine superintelligence won’t happen to share our goals: it might just be really, really good at maximizing ExxonMobil’s stock price, or calculating digits of pi, or whatever it was designed to do. As Theodore Roosevelt said, “To educate [someone] in mind and not in morals is to educate a menace to society.” Scholarly references: Fox & Shulman (2010); Bostrom (2012);Armstrong (2013).
- Convergent instrumental goals (ch. 6). A few specific “instrumental” goals (means to ends) are implied by almost any set of “final” goals. If you want to fill the galaxy with happy sentient beings, you’ll first need to gather a lot of resources, protect yourself from threats, improve yourself so as to achieve your goals more efficiently, and so on. That’s also true if you just want to calculate as many digits of pi as you can, or if you want to maximize ExxonMobil’s stock price. Superintelligent machines are dangerous to humans — not because they’ll angrily rebel against us — rather, the problem is that for almost any set of goals they might have, it’ll be instrumentally useful for them to use our resources to achieve those goals. As Yudkowsky put it, “The AI does not love you, nor does it hate you, but you are made of atoms it can use for something else.” Scholarly references: Omohundro (2008); Bostrom (2012).
- Humans values are complex (ch. 4). Our idealized values — i.e., not what we want right now, but what wewould want if we had more time to think about our values, resolve contradictions in our values, and so on — are probably quite complex. Cognitive scientists have shown that we don’t care just about pleasure or personal happiness; rather, our brains are built with “a thousand shards of desire.” As such, we can’t give an AI our values just by telling it to “maximize human pleasure” or anything so simple as that. If we try to hand-code the AI’s values, we’ll probably miss something that we didn’t realize we cared about. Scholarly references: Dolan & Sharot (2011); Yudkowsky (2011); Muehlhauser & Helm (2013).
- Human values are fragile (ch. 4). In addition to being complex, our values appear to be “fragile” in the following sense: there are some features of our values such that, if we leave them out or get them wrong, the future contains nearly 0% of what we value rather than 99% of what we value. For example, if we get a superintelligent machine to maximize what we value except that we don’t specify consciousness properly, then the future would be filled with minds processing information and doing things but there would be “nobody home.” Or if we get a superintelligent machine to maximize everything we value except that we don’t specify our value for novelty properly, then the future could be filled with minds experiencing the exact same “optimal” experience over and over again, like Mario grabbing the level-end flag on a continuous loop for a trillion years, instead of endless happy adventure. Scholarly reference: Yudkowsky (2011).