This post is in the series “taking metaphors and jokes much too seriously, in the hope that some underlying insight will be gained”. I’ve been finalising a draft section on “the ethics of AI” in the new book (summary – there are no new issues there which were not already important questions of “business ethics” or indeed “political philosophy”, because opaque and complicated decision making systems aren’t new). And consequently, I’ve come up against a favourite example of online AI panickers – the “Paperclip Maximiser”.
The idea here is that someone creates a paperclip-making robot, but enables it with an AI controller that’s just a little bit too good. The AI is able to reprogram itself and improve its intelligence. Every time it improves its intelligence, it can see new ways to reprogram itself, so it gets cleverer at an exponential rate. Quickly, a “singularity” is reached, at which point it becomes so intelligent, so quickly, that the difference between the paperclip AI and us is greater than the difference between humans and amoebae.
For some reason, while becoming godlike in its intelligence and developing self awareness, it never shakes off its weird obsession with paperclips.
Me neither. But anyway, because it’s a superintelligent being which only cares about paperclips, it quickly thinks of ways to commandeer more and more resources conquers Earth, develops intergalactic travel, and the whole universe gets taken apart molecule by molecule to make paperclips. Hence, we need to spend a lot of resources right now on “AI alignment”, seemingly starting by buying a really nice castle to serve as a base for Longtermist philosophers to hold conferences.
This sounds very silly, but lots of people take it seriously. Below, I present a selection of arguments why they shouldn’t.
I started off thinking in business school terms, saying something like “Even allowing its capability and preferences, why doesn’t this super-AI have any concept of the value of time? Why would it invest so much of its effort in planet-conquering and space-exploring projects that don’t produce a single paperclip up front?”
This objection can be dealt with, I think. If we assume that the superintelligence is immortal, then it can be credibly argued that it shouldn’t have a discount rate – since it will be the same AI in a billion years’ time, a paperclip today is worth no more than a paperclip in a billion years’ time.
But I’m not that easy to get rid of, as much of the internet knows. A paperclip maximiser AI might not have a pure discount rate, but uncertainty and risk can do the same job. A paperclip in the hand is worth more than one possible paperclip in the bush, simply because a lot of things might go wrong in a billion years.
To deal with this objection, the AI panicking community has to really skill-up their example; the paperclip maximiser has to be so smart that it can perfectly anticipate everything that might happen (or, sufficiently nearly perfectly to make no odds). It doesn’t have to worry about the concept of risk, because it doesn’t have any concept of uncertainty.
However, I have in my back pocket a really useful little paper from Metroeconomica by Barkley Rosser and Roger Knoppl. It analyses what von Neumann and Morgenstern called “the Holmes-Moriarty problem”.
Sherlock Holmes, pursued by his opponent, Moriarty, leaves London for Dover. The train stops at a station on the way, and he alights there rather than travelling on to Dover. He has seen Moriarty at the railway station, recognizes that he is very clever and expects that Moriarity will take a faster special train in order to catch him in Dover. Holmes' anticipations turns out to be correct. But what if Moriarity had been still more clever, had estimated Holmes' mental abilities better and had foreseen his actions accordingly? Then, obviously, he would have travelled to the intermediate station [Canterbury]. Holmes again would have had to calculate that, and he himself would have decided to go on to Dover. Whereupon, Moriarity would again have “reacted” differently. Because of so much thinking they might not have been able act at all or the intellectually weaker of the two would have surrendered to the other in the Victoria Station, since the whole flight would have become unnecessary.
This problem was used by Morgenstern to motivate the concept of a “mixed strategy” – sometimes in game theory, it’s necessary to pick a move at random, according to optimised probabilities, to avoid getting into binds like this. As Rosser and Knoppl show, the result is actually a bit stronger than that – the Holmes/Moriarty problem is formally uncomputable.
And this in turn means that a perfectly informed paperclip maximiser is only possible if it is the only such machine in the world, or if all of the other Ais are perfectly aligned to its goals. If there’s another maximising machine that wants something different, the paperclip maximiser will be in competition with it for resources. Consequently, it will be in situations where it can only act by following a mixed strategy. Which, in turn, means there’s uncertainty about outcomes, which means that something like a discount rate needs to be applied to value current paperclips at a higher rate than future ones.
What I’m trying to show here is that whatever the risks of non-aligned Ais might be, they can’t be based on Paperclip Maximisers. Because, maximising behaviour itself doesn’t work for a realistic post-Singularity world of ubiquitous superintelligent robots. Like the hedge fund managers who underperform the market, they are unable to all meet their goals not because they’re not clever enough, but because they’re in competition with equally supersmart entities. What you would have to see in such a world would be universally satisficing entities, which tried to make reasonable progress toward their goals, but which developed a wider concept of viability and flourishing.
Which is where we came in – it really is silly, to think that something can both be millions of times smarter than the smartest human being who ever lived or ever will, but nonetheless it gives a toss about paperclips.
There’s something interesting about the way hypotheticals like the “paperclip maximizer” take over discussions. It’s as though they are considered for rhetorical purposes to have the status of a logical proof when in reality, as our host demonstrates, they do not. Reminds me of the hypothetical terrorist who knows the location of a nuclear bomb in discussions about torture.
If you think no ASI would blithely maximise whatever metric it's been given, the burden of proof is really on you. It's not a domain where common sense has any traction.