I think I stuck pretty well to my promise to “write about something else for a while”, but artificial intelligence (and specifically, AI based on large language models) is back on my mind. I have two, related, half-formed thoughts.
First – where are we on the timeline? I don’t, personally, think that the famous “hype cycle” is a particularly useful way of thinking about anything, but products do have takeup curves, and new technologies do follow a fairly consistent pattern of commercialisation. On social media this week, I advanced the idea (and nobody seemed to particularly fervently disagree) that if we were making comparisons, a first step might be to declare the zero point (the “AD to BC”, so to speak) as November 2022, the public release of the ChatGPT demo.
I’d be tempted to match this up to a corresponding point in the public release of Netscape Navigator, in December 1994 (again, arguable, but let’s go with it). That would imply that LLMs have now reached about May 1996. Looking at history books, at this point in the dot com cycle, somewhere between 23% and 36% of adults “go online”; I feel like I might match this up to 37% of adults responding to an industry survey saying that they “use AI chatbots daily in their work”. And culturally, I also feel a sort of rhyme with the number of people who think that the whole thing is stupid and that they’re not missing out and want to loudly say so.
When it comes to commercial products, though, the comparison is very different. At this point in the dot com cycle, Amazon has been going for about six months, so has eBay and Doubleclick is quickly consolidating the online ad space. There were plenty of people claiming to be “reinventing their business around the internet”, as with today’s pivots to AI, but there were also lots of very big and clearly useful, viable businesses (even if they weren’t all showing an accounting profit and many never did) which had been built on the new technology.
Comparing to today, you can see that the economics are different. Building and training an LLM is much more capital-intensive and expensive than setting up an ISP; it’s huge amounts of capex even by telecom industry standards. And there aren’t the same network effects; one reason that I’m setting 1994 as the start point is that this was also when people started to understand that it made a big difference if, as was true of the Web, “consumers were also producers” or people using the product also contributed to the value of the overall system. That was even more the case with Web 2.0, of course. Users of LLMs or chatbots don’t have that kind of effect; they might arguably produce a bit of training data, but it’s not the sort of thing that is going to drive an exponential curve when the rest of the thing is hitting diminishing returns so hard.
But even so, my gut feel is that “AI” in the sense of “applications of LLMs” are behind the curve – there’s a lot of people selling picks and shovels, but nobody seems to be mining gold. There’s still no “killer app” in the sense of something that people and businesses will make a big capital investment simply in order to use. To the extent that we have success stories, they seem to be in coding copilots, but we’ve known for at least fifty years (“The Mythical Man Month”) that improvements in the productivity of individual programmers do not translate to more efficient delivery of commercially viable products in any straightforward way. (I’d actually argue that “innovations which can 10x the productivity of a coder” aren’t even all that rare or uncommon – a lot of high-level programming languages would have been in a position make this claim when they were introduced).
All of which is sort of preparatory – although I started thinking about this while comparing the really quite mature business model of Doubleclick circa 1997 with the “oh yeah we’re putting shopping buttons in the chatbot” announcement from OpenAI[1], what was on my mind beforehand was a different comparison of LLMs to internet search.
If you search for something and don’t find it, one of two things can be the case. Either “Google can’t find it” (it’s not on a page that’s indexed by search engines, or it’s buried in SEO spam, or the specific search term you used wasn’t the right one). Or, “It’s not on the internet” – however many changes are made to the algorithm or however clever your google skills, you won’t find this reference because it’s not in the dataset.
We don’t seem to make this distinction when talking about LLMs. But it seems to me that it might often be important to know if something has gone wrong because the desired answer isn’t in the space defined by “the training data, plus whatever can be filled in by interpolating from the training data”, or whether the answer is something that the algorithm can’t find. In other words - are there things which a given model can’t do, but might have been able to if the training set had been different/bigger? Are there things deep in any given model which mean it will almost never get some kinds of question right no matter how much you train it?
This difference might be particularly important when we’re considering extrapolation from that space – the extremely interesting question of whether LLMs can create something genuinely new[2]. Are we asking whether the algorithm is capable of doing something, or whether it (or sufficient base data to generate it) is in the training set? I don’t know.
[1] This looks like a compliance minefield, doesn’t it? Advertising is quite regulated for accuracy and decency of the claims made. If ChatGPT says that a product is suitable for me but it isn’t, who do I return it to? Even the shopping buttons themselves have huge potential for mischief for any producer that operates a geographical distributor network or needs to worry about channel conflict. Obviously someone wants to take on Google Ads, but that was launched in 2000 after a lot of thought and analysis, not in mid 1996.
[2] Earlier generations of image generators used to have this “cosmic attractor” property – if you kept asking them to make something “more [adjective]” they would always end up making a generic science fiction starfield picture which was in some way the “most”. I noticed while trying to recreate this that the latest generations have something to stop them doing that – most of the time if you say “Make this picture a hundred times more Scottish” or such like, it doesn’t really change much at all. I had to really persuade it to even get this, which is only a bit of the way there:
It certainly not a “killer app“, but AI is making a great deal of money in the legal space, both for contract preparation and in electronic discovery (review of massive quantities of documents in the litigation).
one issue here is that the comparisons above are consumer but most AI applications I encounter are enterprise ones, which means they're invisible to opinionators, who by definition don't have work to do.