This feels vaguely newsworthy because of the screenwriters’ strike, but the relationship between artificial intelligence and art interests me. One thing that I found out while doing research for the next book is that this subject really does have a history. For at least as long as there’s been a concept of artificial intelligence (which is to say, significantly longer than there’s been useful mass market electronic computers), people have been trying to make art with it. One of Stafford Beer’s earliest collaborators was a guy called Gordon Pask who did the most wonderful and bizarre things, including early work on chemical computing and feedback in education, but who also built things like the “Colloquoy of Mobiles” and a system called “Musicolor” that would adapt a light show to the sounds being played.
Another of Pask’s collaborators was an artist called Roy Ascott, who taught a course at the art school in Ipswich which Brian Eno attended. That (along with having been a copy of Brain of the Firm by his mother-in-law) was a big formative influence on Eno; ambient music is probably one of the last niches in which cybernetic ideas are still popular.
The interesting thing to me, though, is that if you look at “generative art” and the way in which Eno and others have used ideas related to algorithmic creativity, they’re really very unlike the current explosion. Looking at a work like “In C” or “Seventy Seven Million Paintings” or “Longplayer”, you can see that the underlying concept is that you’re taking very simple cells of ideas, combined with equally simple rules, and allowing the magic of combinatorics (the branch of mathematics dealing with things multiplying up extremely fast, the sort of thing that tells you there are 43 quintillion possible states for a Rubik Cube) to generate surprising results.
ChatGPT and Dall-E and the like are doing almost diametrically the opposite thing; they’re starting with huge amounts of variety and information, operating on them with extremely large and complicated rule sets, and trying to produce, in a sense, maximally unsurprising outcomes.
The Rubik Cube is often a good way to think about very large combinatoric systems, loosely and analogically. The Eno/ambient school are taking the cube, turning the faces iteratively and seeing what pretty patterns it might generate. Neural network models are trying to solve the cube – their algorithm is trained in such a way as to reward it for moving things to a more recognisable state.
I’m currently doing an online course to get a certificate as an AI developer – not necessarily out of any belief I could be one, but just because I felt that if I’m going to write about this subject area, I ought to at least do the equivalent of picking up a spanner. Very early days, but my initial sensation was of surprise that the representation of the world in the model is so, literally, flat. The adjective in “large language model” modifies the word “model”, not “language”; it’s a large model that’s extremely restricted in terms of its picture of the language.
One of the first exercises you do on any of these courses is a really simple image recognition task – the MNIST dataset of 60,000 small images of handwritten numbers (plus another 10,000 to test the model on). And it interested me that the way you set up the model in that exercise is to read the images into a data frame, turning each 28x28 pixel image into a string of 784 numbers. The first step is to literally flatten things out.
As the course goes on I might realise this is totally wrong, but at present, my mental model of how modern generatiAI models work is that it’s a sort of “Swiss Cheese” system. You have all of these strings of numbers pouring down from heaven, then the model is a load of slices of Swiss Cheese that you move back and forth so that some of the holes line up. The training phase (the bit that uses the scary amounts of compute) is where you jiggle and wiggle the cheese slices so as to maximise the chance that the thing which drops out the bottom is (a string of numbers that can be reformatted as) a bit of art.
That means that there’s intrinsically a bottleneck – the range of things that the system can produce is going to depend partly on the amount of time and compute available for optimising the jiggling of the slices, but to a much greater extent on the width of the slices themselves (their ability to represent a range of possible rules and outcomes). And you can see how, in this model, the well-known factual “hallucinations” develop – they are stuff that drips out of the system because it looks like the desired output.
And that bottleneck is going to restrict the range, centred on something that is in some mathematical sense the “average” of the possible space of art. To get more of an idea of what I’m talking about, try the fun exercise of persuading ChatGPT-3 to write a non-rhyming poem; it really doesn’t want to, although apparently version 4 is much better. It seems to me that this is very much not the same thing as what was previously thought of as AI art, and that it’s unfortunately deep in the structure of the thing that it’s not going to be as interesting.