It certainly not a “killer app“, but AI is making a great deal of money in the legal space, both for contract preparation and in electronic discovery (review of massive quantities of documents in the litigation).
I'll add to this with probably an overly long comment, but speaking as a lawyer who has done a fair bit of electronic discovery work and is at least technology-curious (if not competent).
In discovery, "technology assisted review" (TAR) has been around for awhile and may be what Dan is talking about. The technology has obviously progressed but it predates LLMs by decades. Basically, the idea is more "traditional machine learning" because it works by having a human identify certain relevant documents within a training set and then learning characteristics of the document that make it relevant. You can then run the TAR model over the rest of your dataset to extract documents that you're pretty sure are relevant. It works pretty well, but in the U.S. at least it's a bit maddening because there are still case-by-case arguments about whether this is appropriate.
LLMs specifically haven't overtaken the discovery space (as far as I can tell) but I think the biggest near-term innovation will be more in the simple ability to construct search terms using natural language. So you can just say "Find me all the emails sent by the CEO from [date] to [date]" instead of having to hack out a clunky Boolean search string.
The "old-school" TAR is using natural language processing techniques which predate the invention of the transformer in 2017 (which is really the step-change that makes GPT-style LLMs something new and different), but I would still call it "AI." This is a somewhat related aside but I just found this paper on the history of information retrieval that I am excited to read (https://ieeexplore.ieee.org/document/6182576).
I'm less familiar with contract review generally but my understanding is that this has been around for awhile in the sense of (for example), NLP tools that can detect certain key clauses that are useful for companies that manage tons and tons of contracts, where the company save meaningful money by having a machine tell it to cancel a contract that's set to auto-renew.
I know a bit about electronic discovery, too. It's been my main practice area for the past 20+ years. I brought TAR into the best litigation firm in the US, advised the White House on electronic discovery, and led eDiscovery projects in cases that generated major headlines. TAR/machine assisted learning is important, but it's quite different animal from what's being done with AI. WHat's currently happening is that a reasonable set of prompts, without machine learning, are generating 80% recall and 80% precision, which courts and opposing counsel generally find acceptable.
Conceptually it may be incremental; in terms of effort and cost reductions it's a step change.
I think that's very much an incremental gain and something that is attributable to machine learning in general rather than AI in the sense of LLM. One of my friends was doing big data document trawls back in the 00s
I am not enough of a technical expert to knowledgeably reply about the differences between machine learning LLM. Well, I can say is that this looks to me like it’s very LLM/token based. And it’s certainly incremental overall, but very large in the document review space. Note that this is not about the collection of documents, or rather about the review of them. Reportedly, they are achieving results comparable to the near best of human review, and certainly above the level that courts have accepted. All that said, this is obviously a highly limited domain, and I’m not sure how much these results can be extrapolated more generally or even to other domains.
My 2c on legal AI generally is that although there is alot of VC heat in the legal space I'm not really sure how it will shake out. I think something is bound to work, but my general sense is that people are (to a degree) overhyping "the magic chatbot that will draft important documents for you" and underhyping the benefit of being able to quickly find what you need on a computer database using natural language.
Another thing is that legal data is uniquely hard to access and train on. I think one reason we're seeing so much progress on domain-specific models in coding is that so much code is available on the public internet.
being in legal engineering space I think there's a philosophical gap generative AI and "truth". Sure, you can use it to fill in prior templates but it doesn't have awareness of the legal meaning behind words. In addition, regulations and contracts have prohibitions and restrictions, which under traditional AI requires deontic logic (https://en.wikipedia.org/wiki/Deontic_logic).
Already courts have rapped knuckles for hallucinary case citations and incoherent logic so would you give an AI error&omission insurance?
one issue here is that the comparisons above are consumer but most AI applications I encounter are enterprise ones, which means they're invisible to opinionators, who by definition don't have work to do.
The AI side is mainly enterprise - in fact, it's surprisingly difficult to get any numbers for consumer use, presumably because it's being given away free. As I say, one of my friends who's thanked in the acknowledgments to "lying for money" was using machine learning on legal documents back in the 00s, so there's definitely value there - I just question whether you would be able to put your finger on a chart of the sales and usage of broadly-defined AI products and say "here's the OpenAI and LLM revolution"
It’s interesting that you are writing this now, because I’ve just been thinking about some of the limitations of AI in the context of some of the discussions had here before about black boxes.
A while back we were all having a big discussion about the merits of the notion of ‘don’t open the black box’, because whilst on one hand there are merits to avoiding bothering yourself with unnecessary information, on the other hand, digging under the covers of how something actually works is often very useful towards analysing and controlling somethings input and output, and redesigning it to function better. The broad consensus if I remember correctly was that the notion of “don’t open the black box” actually made more sense in reverse - ie be willing to turn some things into black boxes when there is extremely low returns to the effort of understanding.
This however got me thinking on which circumstances it would actually be appropriate to ‘close a black box around something’ so to speak, and it is here where I think AI is really going to come undone. With something like a car, you have a perfect example of what I would call a ‘functional black box’ - you do not need to know the full science and physics of how a car works in order to know how to drive it. However, the important thing is, should the car go wrong, the nature of how it is constructed means it is nonetheless possible for someone to understand how it works and thus modify it to fix it, or otherwise cause a desirable outcome. In other words, whilst it can be black boxed, the black box is in itself openable, and it’s simply a question of your role as to whether or not it’s worth the effort of doing so.
However, AI seems to be different. As far as I can tell, the bit that AI has actually seen an innovation in is essentially in raw statistics - feed a tonne of data in, get essentially a super fancy average of it out. However, the nature of this functioning means that it is essentially impossible to know what exactly is going on as you are essentially witnessing a bunch of controlled randomness at play - whereas a car mechanic can directly link given behaviours to given structures, and understand how changes to the structure will affect its behaviour, the best an AI or data technician can do is alter the inputs and design and hope - when you ask about a specific outcome, all they can do is shrug. In other words, AI is not only a black box, but a black box in which it is impossible to open.
The significance of this for me links back into a lot of the stuff you said in The Unaccountability Machine about outsourcing. Fundamentally, a lot of problems in business emerge out of companies placing essentially functions outside their own systems such that they cannot control their behaviour or fix problems that emerge as a result - essentially black boxing their way to failure. Based on the above assumptions, it feels like any situation where a task is outsourced entirely to AI (as opposed to AI being a tool strictly under the use of a particular person) is essentially going to be this on steroids - not only does the company have no ideas how to self correct, but literally nobody does because it’s essentially a unpredictable force of nature. At this point, you are going to fail the fundamental requirement of viability - that of being able to consciously adapt you structure to the environment. I also think this is why people are so instinctively against AI making decisions - they know that AI is essentially an inscrutable oogie boogie machine that makes decisions for completely different reasons than the right ones, so even if it does usually make the correct ones, there’s nothing you can do to control for that being the fact.
Isn’t the key barrier to adoption the point at which you can safely accept an answer that the bot gives you to a question to which you don’t already know the answer and can’t (or don’t have time to) check it yourself? Get there and everything changes forever, or so it seems to me. Short of that, not so much.
If that’s the requirement then it will never happen (or at least we are no closer to it than pre-ChatGPT) because the fundamental systems that have been innovated obtain answers using methods that are completely different to what makes the answer the right answer. To use an epistemology example, AI is essentially one giant Gettier Case
To be clear I am not in any way suggesting it will happen, or how, or when. But I do think that a lot of the ideas for how AI will be used have an unspoken (or spoken) premise that we’ll get there someday soon.
Good post, makes me think of the graph that’s being doing the rounds recently proudly titled ‘The length of tasks AI can do is doubling every 7 months’ and then in the subtitle has (at 50% success rate)… importance of those things might need to flip for any killer app effect
Personally speaking i was until lately in the very pessimistic camp about AI use cases. However with the new research features in the famous LLMs have been very useful. They do not produce anything new and also definitely do not produce anything extraordinary, but it does produce reports that would require someone average but hardworking a few weeks to compile. That takes 5 minutes or so on Gemini. I think the productivity gains (speaking of current tech not agi or whatever snake oil is on the market), is that idiots in any organisation are much more competent and more importantly faster. Idiots can work with less supervision because they have a smarter idiot they're checking in with. And of course it goes upwards too when you're reporting to an idiot and they demand needless work ... viola it's available om demand.
"my gut feel is that “AI” in the sense of “applications of LLMs” are behind the curve – there’s a lot of people selling picks and shovels, but nobody seems to be mining gold. There’s still no “killer app” in the sense of something that people and businesses will make a big capital investment simply in order to use"
The first killer app is the chatbot itself. Multiple serious research efforts have found that it is sharply impacting productivity at economy scale. One example: https://www.nber.org/papers/w32966
"We don’t seem to make this distinction when talking about LLMs. But it seems to me that it might often be important to know if something has gone wrong because the desired answer isn’t in the space defined by “the training data, plus whatever can be filled in by interpolating from the training data”, or whether the answer is something that the algorithm can’t find. In other words - are there things which a given model can’t do, but might have been able to if the training set had been different/bigger?"
Two concerns you touch on in this paragraph are the subject of serious informed discussion. First, the tools are getting much better at acknowledging what they don't know. Second, a core debate in projecting improvement is how much it will be limited to "closed domains" such as math and coding where verifying answers is automatable and quick, versus open domains such as, well most endeavors: economic forecasting, poetry, architecture, ...
I am using the original definition of killer app, which is to say "software sufficiently valuable to drive hardware purchases" - I don't think I could extend it far enough to something which is being given away free or sold at a loss.
I can't access anything beyond the abstract of that paper on this device - the abstract only seems to talk about it being used a lot, and also in many contexts where it looks to me as if the use is "replacing or augmenting web search" rather than anything which I would want to bet on being able to see in productivity data. since the internet itself famously took a very long time to show up (and many would argue still hasn't) the skeptic position looks still defensible to me. do they make stronger claims in the paper beyond the abstract?
"killer app": I see your logic, but it is strange to me. That the value is being delivered with fewer steps than in older go-to-market approaches seems to me an argument in its favor. As for "given away free or sold for a loss", ChatGPT is one of if not the most used software products ever. I am comfortable presuming they can monetize at great scale and engineer costs, technically and/or politically.
Stats from the paper (2024):
- Used by 40% of U.S. adults
- Adopted faster than PCs or the Internet
- 10% of workers use it every day
- 1-5% of all work hours
- time savings ~= 1.4 percent
Six main claims; here is the last: "Sixth, we provide a rough estimate of the early impact of genAI on aggregate productivity using data from self-reported time savings. We ask users how many additional hours they would have needed to complete the same amount of work in the previous week if they had not had access to genAI. We estimate a mean time savings of 5.4 percent among all genAI work users, which implies a mean time savings of 1.4 percent among all workers (including non-users). Time savings are highly correlated with the intensity of genAI use and thus vary widely across occupations and industries. Using a standard model of aggregate production, we estimate a potential aggregate productivity gain of 1.1 percent at current levels of worker genAI usage. This is similar to Acemoglu (2024), who estimates a potential productivity gain of 0.7 percent using estimates of task-based exposure to genAI rather than actual adoption data."
thanks, that works - I think this is the substance of my other point though - a perceived 5% gain for an individual absolutely does not translate to a 5% gain in output for the organisation. Aggregating tasks doesn't work this way and it's a real blind spot of economists that they have a model of the firm in which it might.
"a perceived 5% gain for an individual absolutely does not translate to a 5% gain in output for the organisation" — well no, but there are many lines of evidence suggesting many jobs are becoming much more productive. Will it take time for organizations to, well, reorganize? Sure. But the sheer scale of what's happening is the story IMHO. Two "slowly ... then all at once" tweets from Ethan Mollick today:
"users report average time savings of just 2.8% of work hours"—"just" is contradicting by 2.8%. That's a lot, in 2023(!) and 2024. Meanwhile, for a year and with no signs of stopping:
- the tools have gotten better
- they are more widely adopted
- we're learning how to use them more effectively and get to the firm-level effects that they didn't find
In this case and in Dan's "another guy in India", which just came out, I think we have a simple conflict between Dan's deductive reasoning from what he knows and my inductive reasoning from what I'm learning on the job, by research, and most importantly by a different kind of systems thinking: we are in an acceleration. Yes, inertia is powerful. But it is not all-powerful.
My point is just that there’s research that swings the other way too.
(And, like, if you’re taking papers that swing one way at face value and critiquing/declaring as outdated the ones that swing the other way, that’s maybe not great for credibility 🙂)
I think the question with extrapolation is not “can the AI create something genuinely new?” (the answer is “yes”), it’s “can the AI create something genuinely new that’s also genuinely good?”, and if so, “can it do so consistently?”. Extending trend lines outside the “training point cloud” is straightforward to do; the concern/issue is that the functions defining those lines have undefined quality outside said cloud. My naive assumption would be that there will be some cases of real value being created by LLM extrapolation, by virtue of extremely large (but finite!) numbers of monkey typists being employed, but that the typical attempt will produce useless junk.
I discussed some of these issues recently. The best use case for AI, IMO, is combining a high-quality search engine (something Google could have developed long ago) with an AI text summariser to produce an automated literature survey. That's what Deep Research does
I have been playing with Local Deep Research on my new Mac but although it had some interesting insights in the reasoning phase it eventually decided it wasn't going to answer the question. (This using Gemma3-12B as the main model) I may need to hack on it some more
"I’d be tempted to match this up to a corresponding point in the public release of Netscape Navigator, in December 1994 (again, arguable, but let’s go with it). That would imply that LLMs If you search for something and don’t find it, one of two things can be the case. Either “Google can’t find it” (it’s not on a page that’s indexed by search engines, or it’s buried in SEO spam, or the specific search term you used wasn’t the right one). Or, “It’s not on the internet”"
This is only true of LLMs is you are using it without adding your own documents (or other media). You can use the LLM as a chatbot interface to your own documents, whether session (Context) or permanent (RAG). Then the LLM can use its algorithms to extract the data from these documents. This is appropriate for companies using their many documents useful for various functions. It is useful for individuals who have specific documents in digital form (or made digital with OCR) to interrogate. For example, you could OCR an insurance document for the LLM to answer questions about coverage. [Wouldn't it be useful, if before you buy a policy, an LLM could answer key questions about the policy and any gotchas?]
Coding is one example that could help coders and non-coders solve specific problems. Another is to help with math problems- in my case, advanced math.
Now this is all clearly harder than just typing into a search box or clicking through links, but I think there will be software to make this all much easier, and importantly, able to run on a local computer for privacy. IMO, the LLM is just the technology that will facilitate something like an advanced browser interface for various applications. I don't think it will do well with applications like comparison shopping because it can be gamed to provide a biased response. But it will be a useful speech interface, and I expect that useful conversational language translators will appear as local applications superior to the rather clunky Google Translate. They will be useful for drivers requesting information and help on the road. I have no doubt that specialized LLMs will prove useful in much the same way as books do.
But AGI, this is a fantasy. And weaponized LLMs, in commerce. social media, and military warfare worries me - a lot.
just to be clear - you've run together two separate bits of the post there. In the first part I'm wondering about commercial impact and I'm sure that there are material valuable uses - I just want to use the 1990s Internet as a yardstick to say "since the amounts of capex and stock market valuation are comparable, are we seeing benefits as big and as quick as going from no internet to internet"? In the second bit I'm trying to think through the limits of AI (on the basis of LLMs) and the question of what limitations are coming from the dataset and what are intrinsic.
IIRC, economist Robert Gordon couldn't detect any productivity gains from computerization and the internet. [I believe he corrected that with more time.] While I cannot do without the web, my [more] elderly neighbors refuse to have computers in the house or even use smartphones. Any web benefits are very indirect for them. But the web comes at a cost, requiring an arms race to protect oneself from malware, viruses, scams, and let alone malfunctioning software and computing systems. However, if the Internet was so valuable, why did the NASDAQ gain so much and then collapse in 2000? Was the market reflecting value or just animal spirits? LLM AI seems to be doing the same now.
As for LLM limits. What I think we are seeing is that just throwing ever more data into training by the hyperscalers like OpenAI, does not seem to improve performance. Smaller LLMs with better curated data seem to be just as effective. The push to AGI may be a big mistake with current architecture, and I applaud the direction of more specific training with better-designed data sets in different domains. You may have seen the release of a new math data set that claims to significantly improve reasoning on math problems that are far more difficult than I can solve.
But longer term, I think that different architectures will be needed to get more human-like performance, although ultimately, such AIs will need to be embodied to interact with the real world, or a very realistic simulation of one.
But I caution. LLMs are the first truly cognitive technology. This is very different from all prior technologies, even AI techniques of the past. We are going to be living in a world where such intelligences will have a huge influence on societies, and so far, it seems, not in a good way. Such trends will not make life easier, but rather harder and fraught with pitfalls. MMy wife will not even talk about AI, and there are times when I think her fears (based on her education as a historian) may be founded.
RAG is *the* key feature for generative AI in the enterprise. Possibly the key feature, full stop. We'll see a hell of a lot more of those as it's already become productized (AWS Bedrock has a built in RAG tool)
the thing that impressed me most about the civil service "Humphrey" was its ability to produce a tagged database of correspondence, something I used to do manually. The issue for me is not so much whether it's a viable commercial product (I know it is, because I've spent a decade selling the exact same product powered by an office full of people in India) as whether it's transformational at the macroeconomic level (views on this also informed by said decade)
If I had had a close personal relationship with Stephen Hawking, so that I could have just called him up anytime when he was alive, I'd have had zero idea what to ask him. That's how I've been thinking about potential uses for AI. (I wouldn't be mortified to ask stupid questions of the AI, but I also probably wouldn't waste the time.)
Did this fear ever stop you from asking questions of your teachers or professors? If not, why would you be intimidated by an AI? You can even prompt the AI to give answers appropriate to your education, and it will patiently interact with you as you ask for explanations of details you don't yet understand. It won't judge you. It should be the easiest thing to ask questions of.
It certainly not a “killer app“, but AI is making a great deal of money in the legal space, both for contract preparation and in electronic discovery (review of massive quantities of documents in the litigation).
I'll add to this with probably an overly long comment, but speaking as a lawyer who has done a fair bit of electronic discovery work and is at least technology-curious (if not competent).
In discovery, "technology assisted review" (TAR) has been around for awhile and may be what Dan is talking about. The technology has obviously progressed but it predates LLMs by decades. Basically, the idea is more "traditional machine learning" because it works by having a human identify certain relevant documents within a training set and then learning characteristics of the document that make it relevant. You can then run the TAR model over the rest of your dataset to extract documents that you're pretty sure are relevant. It works pretty well, but in the U.S. at least it's a bit maddening because there are still case-by-case arguments about whether this is appropriate.
LLMs specifically haven't overtaken the discovery space (as far as I can tell) but I think the biggest near-term innovation will be more in the simple ability to construct search terms using natural language. So you can just say "Find me all the emails sent by the CEO from [date] to [date]" instead of having to hack out a clunky Boolean search string.
The "old-school" TAR is using natural language processing techniques which predate the invention of the transformer in 2017 (which is really the step-change that makes GPT-style LLMs something new and different), but I would still call it "AI." This is a somewhat related aside but I just found this paper on the history of information retrieval that I am excited to read (https://ieeexplore.ieee.org/document/6182576).
I'm less familiar with contract review generally but my understanding is that this has been around for awhile in the sense of (for example), NLP tools that can detect certain key clauses that are useful for companies that manage tons and tons of contracts, where the company save meaningful money by having a machine tell it to cancel a contract that's set to auto-renew.
I know a bit about electronic discovery, too. It's been my main practice area for the past 20+ years. I brought TAR into the best litigation firm in the US, advised the White House on electronic discovery, and led eDiscovery projects in cases that generated major headlines. TAR/machine assisted learning is important, but it's quite different animal from what's being done with AI. WHat's currently happening is that a reasonable set of prompts, without machine learning, are generating 80% recall and 80% precision, which courts and opposing counsel generally find acceptable.
Conceptually it may be incremental; in terms of effort and cost reductions it's a step change.
I think that's very much an incremental gain and something that is attributable to machine learning in general rather than AI in the sense of LLM. One of my friends was doing big data document trawls back in the 00s
I am not enough of a technical expert to knowledgeably reply about the differences between machine learning LLM. Well, I can say is that this looks to me like it’s very LLM/token based. And it’s certainly incremental overall, but very large in the document review space. Note that this is not about the collection of documents, or rather about the review of them. Reportedly, they are achieving results comparable to the near best of human review, and certainly above the level that courts have accepted. All that said, this is obviously a highly limited domain, and I’m not sure how much these results can be extrapolated more generally or even to other domains.
My 2c on legal AI generally is that although there is alot of VC heat in the legal space I'm not really sure how it will shake out. I think something is bound to work, but my general sense is that people are (to a degree) overhyping "the magic chatbot that will draft important documents for you" and underhyping the benefit of being able to quickly find what you need on a computer database using natural language.
Another thing is that legal data is uniquely hard to access and train on. I think one reason we're seeing so much progress on domain-specific models in coding is that so much code is available on the public internet.
being in legal engineering space I think there's a philosophical gap generative AI and "truth". Sure, you can use it to fill in prior templates but it doesn't have awareness of the legal meaning behind words. In addition, regulations and contracts have prohibitions and restrictions, which under traditional AI requires deontic logic (https://en.wikipedia.org/wiki/Deontic_logic).
Already courts have rapped knuckles for hallucinary case citations and incoherent logic so would you give an AI error&omission insurance?
one issue here is that the comparisons above are consumer but most AI applications I encounter are enterprise ones, which means they're invisible to opinionators, who by definition don't have work to do.
The AI side is mainly enterprise - in fact, it's surprisingly difficult to get any numbers for consumer use, presumably because it's being given away free. As I say, one of my friends who's thanked in the acknowledgments to "lying for money" was using machine learning on legal documents back in the 00s, so there's definitely value there - I just question whether you would be able to put your finger on a chart of the sales and usage of broadly-defined AI products and say "here's the OpenAI and LLM revolution"
It’s interesting that you are writing this now, because I’ve just been thinking about some of the limitations of AI in the context of some of the discussions had here before about black boxes.
A while back we were all having a big discussion about the merits of the notion of ‘don’t open the black box’, because whilst on one hand there are merits to avoiding bothering yourself with unnecessary information, on the other hand, digging under the covers of how something actually works is often very useful towards analysing and controlling somethings input and output, and redesigning it to function better. The broad consensus if I remember correctly was that the notion of “don’t open the black box” actually made more sense in reverse - ie be willing to turn some things into black boxes when there is extremely low returns to the effort of understanding.
This however got me thinking on which circumstances it would actually be appropriate to ‘close a black box around something’ so to speak, and it is here where I think AI is really going to come undone. With something like a car, you have a perfect example of what I would call a ‘functional black box’ - you do not need to know the full science and physics of how a car works in order to know how to drive it. However, the important thing is, should the car go wrong, the nature of how it is constructed means it is nonetheless possible for someone to understand how it works and thus modify it to fix it, or otherwise cause a desirable outcome. In other words, whilst it can be black boxed, the black box is in itself openable, and it’s simply a question of your role as to whether or not it’s worth the effort of doing so.
However, AI seems to be different. As far as I can tell, the bit that AI has actually seen an innovation in is essentially in raw statistics - feed a tonne of data in, get essentially a super fancy average of it out. However, the nature of this functioning means that it is essentially impossible to know what exactly is going on as you are essentially witnessing a bunch of controlled randomness at play - whereas a car mechanic can directly link given behaviours to given structures, and understand how changes to the structure will affect its behaviour, the best an AI or data technician can do is alter the inputs and design and hope - when you ask about a specific outcome, all they can do is shrug. In other words, AI is not only a black box, but a black box in which it is impossible to open.
The significance of this for me links back into a lot of the stuff you said in The Unaccountability Machine about outsourcing. Fundamentally, a lot of problems in business emerge out of companies placing essentially functions outside their own systems such that they cannot control their behaviour or fix problems that emerge as a result - essentially black boxing their way to failure. Based on the above assumptions, it feels like any situation where a task is outsourced entirely to AI (as opposed to AI being a tool strictly under the use of a particular person) is essentially going to be this on steroids - not only does the company have no ideas how to self correct, but literally nobody does because it’s essentially a unpredictable force of nature. At this point, you are going to fail the fundamental requirement of viability - that of being able to consciously adapt you structure to the environment. I also think this is why people are so instinctively against AI making decisions - they know that AI is essentially an inscrutable oogie boogie machine that makes decisions for completely different reasons than the right ones, so even if it does usually make the correct ones, there’s nothing you can do to control for that being the fact.
Isn’t the key barrier to adoption the point at which you can safely accept an answer that the bot gives you to a question to which you don’t already know the answer and can’t (or don’t have time to) check it yourself? Get there and everything changes forever, or so it seems to me. Short of that, not so much.
If that’s the requirement then it will never happen (or at least we are no closer to it than pre-ChatGPT) because the fundamental systems that have been innovated obtain answers using methods that are completely different to what makes the answer the right answer. To use an epistemology example, AI is essentially one giant Gettier Case
To be clear I am not in any way suggesting it will happen, or how, or when. But I do think that a lot of the ideas for how AI will be used have an unspoken (or spoken) premise that we’ll get there someday soon.
I concur with this answer.
Good post, makes me think of the graph that’s being doing the rounds recently proudly titled ‘The length of tasks AI can do is doubling every 7 months’ and then in the subtitle has (at 50% success rate)… importance of those things might need to flip for any killer app effect
Personally speaking i was until lately in the very pessimistic camp about AI use cases. However with the new research features in the famous LLMs have been very useful. They do not produce anything new and also definitely do not produce anything extraordinary, but it does produce reports that would require someone average but hardworking a few weeks to compile. That takes 5 minutes or so on Gemini. I think the productivity gains (speaking of current tech not agi or whatever snake oil is on the market), is that idiots in any organisation are much more competent and more importantly faster. Idiots can work with less supervision because they have a smarter idiot they're checking in with. And of course it goes upwards too when you're reporting to an idiot and they demand needless work ... viola it's available om demand.
I looked for Haggis in your image and did not find it.
"my gut feel is that “AI” in the sense of “applications of LLMs” are behind the curve – there’s a lot of people selling picks and shovels, but nobody seems to be mining gold. There’s still no “killer app” in the sense of something that people and businesses will make a big capital investment simply in order to use"
The first killer app is the chatbot itself. Multiple serious research efforts have found that it is sharply impacting productivity at economy scale. One example: https://www.nber.org/papers/w32966
"We don’t seem to make this distinction when talking about LLMs. But it seems to me that it might often be important to know if something has gone wrong because the desired answer isn’t in the space defined by “the training data, plus whatever can be filled in by interpolating from the training data”, or whether the answer is something that the algorithm can’t find. In other words - are there things which a given model can’t do, but might have been able to if the training set had been different/bigger?"
Two concerns you touch on in this paragraph are the subject of serious informed discussion. First, the tools are getting much better at acknowledging what they don't know. Second, a core debate in projecting improvement is how much it will be limited to "closed domains" such as math and coding where verifying answers is automatable and quick, versus open domains such as, well most endeavors: economic forecasting, poetry, architecture, ...
I am using the original definition of killer app, which is to say "software sufficiently valuable to drive hardware purchases" - I don't think I could extend it far enough to something which is being given away free or sold at a loss.
I can't access anything beyond the abstract of that paper on this device - the abstract only seems to talk about it being used a lot, and also in many contexts where it looks to me as if the use is "replacing or augmenting web search" rather than anything which I would want to bet on being able to see in productivity data. since the internet itself famously took a very long time to show up (and many would argue still hasn't) the skeptic position looks still defensible to me. do they make stronger claims in the paper beyond the abstract?
"killer app": I see your logic, but it is strange to me. That the value is being delivered with fewer steps than in older go-to-market approaches seems to me an argument in its favor. As for "given away free or sold for a loss", ChatGPT is one of if not the most used software products ever. I am comfortable presuming they can monetize at great scale and engineer costs, technically and/or politically.
Stats from the paper (2024):
- Used by 40% of U.S. adults
- Adopted faster than PCs or the Internet
- 10% of workers use it every day
- 1-5% of all work hours
- time savings ~= 1.4 percent
Six main claims; here is the last: "Sixth, we provide a rough estimate of the early impact of genAI on aggregate productivity using data from self-reported time savings. We ask users how many additional hours they would have needed to complete the same amount of work in the previous week if they had not had access to genAI. We estimate a mean time savings of 5.4 percent among all genAI work users, which implies a mean time savings of 1.4 percent among all workers (including non-users). Time savings are highly correlated with the intensity of genAI use and thus vary widely across occupations and industries. Using a standard model of aggregate production, we estimate a potential aggregate productivity gain of 1.1 percent at current levels of worker genAI usage. This is similar to Acemoglu (2024), who estimates a potential productivity gain of 0.7 percent using estimates of task-based exposure to genAI rather than actual adoption data."
... but there are many other such studies, for example https://www.oneusefulthing.org/p/the-cybernetic-teammate?publication_id=1180644&post_id=159529723 and linked paper.
NBER tip which works for me: the link from the abstract to the paper is gated, but the direct URL pattern is simple: https://www.nber.org/system/files/working_papers/w32966/w32966.pdf . LMK if you'd like me to DM you a copy.
thanks, that works - I think this is the substance of my other point though - a perceived 5% gain for an individual absolutely does not translate to a 5% gain in output for the organisation. Aggregating tasks doesn't work this way and it's a real blind spot of economists that they have a model of the firm in which it might.
"a perceived 5% gain for an individual absolutely does not translate to a 5% gain in output for the organisation" — well no, but there are many lines of evidence suggesting many jobs are becoming much more productive. Will it take time for organizations to, well, reorganize? Sure. But the sheer scale of what's happening is the story IMHO. Two "slowly ... then all at once" tweets from Ethan Mollick today:
- AI tutors are *extremely* effective today: https://x.com/emollick/status/1917253290791956578
- Anthropic believes that AI engineering tools will be more productive than the best people in the next couple years: https://x.com/emollick/status/1917698375115321714
(The latter, as he notes, is Anthropic talking their book ... but that doesn't mean they're incorrect.)
Data pointing in the opposite direction: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5219933
"users report average time savings of just 2.8% of work hours"—"just" is contradicting by 2.8%. That's a lot, in 2023(!) and 2024. Meanwhile, for a year and with no signs of stopping:
- the tools have gotten better
- they are more widely adopted
- we're learning how to use them more effectively and get to the firm-level effects that they didn't find
In this case and in Dan's "another guy in India", which just came out, I think we have a simple conflict between Dan's deductive reasoning from what he knows and my inductive reasoning from what I'm learning on the job, by research, and most importantly by a different kind of systems thinking: we are in an acceleration. Yes, inertia is powerful. But it is not all-powerful.
My point is just that there’s research that swings the other way too.
(And, like, if you’re taking papers that swing one way at face value and critiquing/declaring as outdated the ones that swing the other way, that’s maybe not great for credibility 🙂)
> if you’re taking papers that swing one way at face value and critiquing/declaring as outdated the ones that swing the other way
That's good heuristic for assessing people's credibility. Fortunately there is a lot more in my comments to assess ;-) .
This is a misapprehension of “killer app”. Chatbots aren’t profitable.
I think the question with extrapolation is not “can the AI create something genuinely new?” (the answer is “yes”), it’s “can the AI create something genuinely new that’s also genuinely good?”, and if so, “can it do so consistently?”. Extending trend lines outside the “training point cloud” is straightforward to do; the concern/issue is that the functions defining those lines have undefined quality outside said cloud. My naive assumption would be that there will be some cases of real value being created by LLM extrapolation, by virtue of extremely large (but finite!) numbers of monkey typists being employed, but that the typical attempt will produce useless junk.
I discussed some of these issues recently. The best use case for AI, IMO, is combining a high-quality search engine (something Google could have developed long ago) with an AI text summariser to produce an automated literature survey. That's what Deep Research does
https://johnquigginblog.substack.com/p/not-so-deep-thoughts-about-deep-ai
I have been playing with Local Deep Research on my new Mac but although it had some interesting insights in the reasoning phase it eventually decided it wasn't going to answer the question. (This using Gemma3-12B as the main model) I may need to hack on it some more
"I’d be tempted to match this up to a corresponding point in the public release of Netscape Navigator, in December 1994 (again, arguable, but let’s go with it). That would imply that LLMs If you search for something and don’t find it, one of two things can be the case. Either “Google can’t find it” (it’s not on a page that’s indexed by search engines, or it’s buried in SEO spam, or the specific search term you used wasn’t the right one). Or, “It’s not on the internet”"
This is only true of LLMs is you are using it without adding your own documents (or other media). You can use the LLM as a chatbot interface to your own documents, whether session (Context) or permanent (RAG). Then the LLM can use its algorithms to extract the data from these documents. This is appropriate for companies using their many documents useful for various functions. It is useful for individuals who have specific documents in digital form (or made digital with OCR) to interrogate. For example, you could OCR an insurance document for the LLM to answer questions about coverage. [Wouldn't it be useful, if before you buy a policy, an LLM could answer key questions about the policy and any gotchas?]
Coding is one example that could help coders and non-coders solve specific problems. Another is to help with math problems- in my case, advanced math.
Now this is all clearly harder than just typing into a search box or clicking through links, but I think there will be software to make this all much easier, and importantly, able to run on a local computer for privacy. IMO, the LLM is just the technology that will facilitate something like an advanced browser interface for various applications. I don't think it will do well with applications like comparison shopping because it can be gamed to provide a biased response. But it will be a useful speech interface, and I expect that useful conversational language translators will appear as local applications superior to the rather clunky Google Translate. They will be useful for drivers requesting information and help on the road. I have no doubt that specialized LLMs will prove useful in much the same way as books do.
But AGI, this is a fantasy. And weaponized LLMs, in commerce. social media, and military warfare worries me - a lot.
just to be clear - you've run together two separate bits of the post there. In the first part I'm wondering about commercial impact and I'm sure that there are material valuable uses - I just want to use the 1990s Internet as a yardstick to say "since the amounts of capex and stock market valuation are comparable, are we seeing benefits as big and as quick as going from no internet to internet"? In the second bit I'm trying to think through the limits of AI (on the basis of LLMs) and the question of what limitations are coming from the dataset and what are intrinsic.
IIRC, economist Robert Gordon couldn't detect any productivity gains from computerization and the internet. [I believe he corrected that with more time.] While I cannot do without the web, my [more] elderly neighbors refuse to have computers in the house or even use smartphones. Any web benefits are very indirect for them. But the web comes at a cost, requiring an arms race to protect oneself from malware, viruses, scams, and let alone malfunctioning software and computing systems. However, if the Internet was so valuable, why did the NASDAQ gain so much and then collapse in 2000? Was the market reflecting value or just animal spirits? LLM AI seems to be doing the same now.
As for LLM limits. What I think we are seeing is that just throwing ever more data into training by the hyperscalers like OpenAI, does not seem to improve performance. Smaller LLMs with better curated data seem to be just as effective. The push to AGI may be a big mistake with current architecture, and I applaud the direction of more specific training with better-designed data sets in different domains. You may have seen the release of a new math data set that claims to significantly improve reasoning on math problems that are far more difficult than I can solve.
But longer term, I think that different architectures will be needed to get more human-like performance, although ultimately, such AIs will need to be embodied to interact with the real world, or a very realistic simulation of one.
But I caution. LLMs are the first truly cognitive technology. This is very different from all prior technologies, even AI techniques of the past. We are going to be living in a world where such intelligences will have a huge influence on societies, and so far, it seems, not in a good way. Such trends will not make life easier, but rather harder and fraught with pitfalls. MMy wife will not even talk about AI, and there are times when I think her fears (based on her education as a historian) may be founded.
RAG is *the* key feature for generative AI in the enterprise. Possibly the key feature, full stop. We'll see a hell of a lot more of those as it's already become productized (AWS Bedrock has a built in RAG tool)
the thing that impressed me most about the civil service "Humphrey" was its ability to produce a tagged database of correspondence, something I used to do manually. The issue for me is not so much whether it's a viable commercial product (I know it is, because I've spent a decade selling the exact same product powered by an office full of people in India) as whether it's transformational at the macroeconomic level (views on this also informed by said decade)
that said our new corporate 'bot just gave me wrong financial numbers for a large public company, so...
If I had had a close personal relationship with Stephen Hawking, so that I could have just called him up anytime when he was alive, I'd have had zero idea what to ask him. That's how I've been thinking about potential uses for AI. (I wouldn't be mortified to ask stupid questions of the AI, but I also probably wouldn't waste the time.)
Did this fear ever stop you from asking questions of your teachers or professors? If not, why would you be intimidated by an AI? You can even prompt the AI to give answers appropriate to your education, and it will patiently interact with you as you ask for explanations of details you don't yet understand. It won't judge you. It should be the easiest thing to ask questions of.