34 Comments
User's avatar
Larissa de Lima's avatar

This really speaks to me as a former math olympiad kid who was for a time a management consultant

I think the post and a lot of the comments are drawing upon a distinction between exploration within a known problem space and the creation of new problem spaces - and whether AI can cross that line.

I really like this post from mathematician Daniel Litt (https://www.daniellitt.com/blog/2026/2/20/mathematics-in-the-library-of-babel), who engages with this. He sees little sign of models autonomously building theory, i.e., creating new frameworks rather than proving results within existing ones. The Royen story is a perfect case of the latter. The more recent Erdős unit distance result from May fits the same pattern

When thinking about what makes new-space creation hard to automate, I think its connected to James C. Scott's concept of metis; the messy contextual (or what I remember of it). Mathematicians are drawn to structures that feel productive in ways shaped by their moment, their aesthetics, the tools they happen to know, the generation of people they draw upon (connecting to Felix Salmon's comment). That's part of what creates areas that are exploitable by AI (forgotten paths).

But the same contextual pressures that cause mathematicians to overlook certain paths (creating the gaps AI can exploit), I hypothesize, are also what drive them to create new frameworks in the first place. You compress because you have to, and the shape of that compression is itself a creative act shaped by the specific moment, the specific metis. It defines the "taste" of what's worth pursuing and directing one's creativity towards.

That said, this interview (https://www.youtube.com/watch?v=DRcFvXAcxMg) with Tudor Achim (CEO of Harmonic, which tied OpenAI and DeepMind for IMO gold in 2024) is quite the provocation. He describes their process as using hallucination as the engine of creativity: models generate a wide tree of approaches, mostly wrong, and a formal verifier prunes them. Hallucination also injects entropy, novel combinations the model hasn't seen. It could be that throwing scale at the problem then allows some stumbling into genuinely novel territory.

Where I most agree with you is on management being a harder transfer case. The metis in business isn't just hard to codify, its itself dynamically being redefined, as it feeds on itself. People act against their conditions, which changes the conditions, which regenerates the metis. The frontier is always a moving post. You don't get that with math - unsolved problems don't change their nature just because someone is looking at them.

Ben Recht's avatar

This!

Jordan Ellenberg has written about similar things before in his book Shape, arguing that math doesn’t have a separate word for “problems that are difficult” and “problems that it’s difficult to recognize are actually easy.” Problems that require inventing entirely new fields of math and problems that just require a single unseen connection. Both get called "open problems."

Kaleberg's avatar

The prolific mathematician Erdos used to assign a value to unsolved problems in mathematics. Solving some problems opens new doors. Other problems just add a bit of masonry. Now and then, he would downgrade a problem when it turned out to reveal less new mathematics than he had hoped.

Linking two well known equations to prove the GCI theorem didn't open any new doors. The consensus was that the theorem was correct, and, as it turned out, it didn't take any new mathematics to prove it. Compare that with something like Fermat's famous theorem which required advances in elliptical equations and in modular forms and then required them to be the same. The more modern forms of the proof are a testament to the new mathematical scaffolding that the proof required. (Interestingly, there is an ongoing project to redo the proof in Lean, an automatic proof checking system. This has been far from an automatic process and numerous mathematicians are involved. The outcome may introduce important new tools for the field.)

Tom Slee's avatar

A minor add-on to this eye-opening comment, regarding management. Benjamin Recht, in "The Irrational Decision" makes the point that many management (and political) decisions involve tradeoffs between things that cannot be easily compared, and suggests that these decisions can't easily be "optimized". FWIW I wasn't a big fan of the book overall, but this has stuck with me and, I think, supports your case.

https://www.librarything.com/work/35763698/book/312488789

Dave's avatar

+1 to all of that.

The reason we don't see truly novel concepts out of LLMs is bc they're logically bound to the representations of the semantic data they've been trained with. This is a hard boundary, a speed-of-light-type barrier, laid out with Hume's "problem of induction." That allows LLMs to simulate deduction, because they've seen lots of deductive arguments. But they don't have the theorem provers or math operations to test for logical completeness.

This makes them very good at the "paradigm expansion" phase of a science, in Kuhn's sense, and they can even accelerate that work. It doesn't make them useful for the "paradigm replacement" phase, because they can't encapsulate the abductive reasoning required to play with new semantic concepts that produce new arguments. LLMs can do the "web" part of Quine's "web of belief" but they can't prune or add new truths to the core. Which is just fine. Next token prediction is still really useful for that kind of work.

Eventually we'll switch to SLMs for this work, though. LLMs are too expensive to train and run and 7 trillion semantic connections is way more than you need to profitably hallucinate possible new proofs, or code, or explanations for why trends have shifted.

Matt Woodward's avatar

I'm not sure this is as robustly true as you're suggesting, on the basis that the token generalisation is probabilistic rather than logical or semantic, and therefore not subject to logical or semantic constraints. They can reproduce any output that's statistically permissible within the rules they've derived from their training data, and that can cover an awful lot of territory.

John Harvey's avatar

This is not in my wheelhouse at all but I like your contribution, even if it should later turn out to be wrong. It provokes thought, and introduces weirdness, or confusion. I may be wrong...but you took a swing at the ball!

Sam Tobin-Hochstadt's avatar

Dwarkesh Patel just did a podcast episode with Grant Sanderson (creator of 3blue1brown) about AI for math that spent a lot of time on this topic. I don't think they come to any conclusions that these comments haven't mentioned (it all really depends if the models of 2029 can do what Galois and Grothendiek did) but it's a really nice and illuminating discussion.

Andy Berner's avatar

Was just skimming through the comments to see if anyone had mentioned this interview, as I really enjoyed it and it's apropos. Link here to save folks some typing: https://www.youtube.com/watch?v=TfyPshgMbug

Felix Salmon's avatar

I disagree on this. Most science (and indeed most art) has evolved in a standing-on-the-shoulders-of-giants way, where you follow the state of the art to the point at which you can move the ball forward. (Mixing metaphors is fine in blog comments, sorry not sorry.) Of course the problem of modernity is that there are too many giants in too many fields and it's impossible for any human to even be aware of them all, let alone understand their work in a way that will let them synthesize and find innovative cross-pollinations and remixes. If AI can do that then it will be able to generate a large number of very real breakthroughs.

Dan Davies's avatar

I thought that's what I said? Most of science is exactly this, that's why I reply to Tom saying it might take a long time for me to be proved wrong (unless they start producing similar advances in a field where there isn't so much potential for interpolation)

Tom Slee's avatar

This reminds me of "is mathematics discovery or invention?" To the extent that mathematics is the discovery of what is inherently already there, then even with the OP's view, there seems likely to be a path to "a large number of very real breakthroughs" or even "era-defining breakthroughs". To the extent it is discovery, who knows?

Tom Slee's avatar

An alternative view is a lecture given locally by Google physicist Adam Brown. (Link below: Readers from here can and should ignore the first 15 minutes of "what is an LLM stuff".) I had considerable resistance for personal reasons to the message, but basically: "LLMs have got amazingly better at maths and (theoretical) physics in the last four years and there's little reason to think they will stop now". I guess you're arguing against the second half of that sentence, and all I can say is you will be proved right or wrong in a ridiculously short time. If you don't get disproved in six months, or a year at the outside, you're probably right. But reluctantly I'd put my money on the other side of the bet.

When it comes to science, though, it's as well to remember that without experiments it's difficult to make advances (see the last 50 years of particle physics) and LLM's will be very clumsy in the lab for some time.

https://www.youtube.com/watch?v=Mw60FH5iflI

Dan Davies's avatar

I think it could take quite a long time for me to be proved wrong; I think that is potentially a huge amount of useful maths and physics in the gaps between the literature which has been created by professional specialization and can be discovered by industrial extrapolation/interpolation.

Dan Davies's avatar

(and I think that being clumsy in the lab might be less of a problem for them then you might think - that's going to be the subject of Friday's post)

Matt Woodward's avatar

Counterpoint: everything that looks like an exponential curve is actually an S-curve, and there's *good* reason to think that LLM capability in general is on the back half of the curve right now. That doesn't imply there can't be major upswings in capability for specific fields, because as we're climbing the back half of that curve we're still crossing "good enough for X" boundaries that unlock new chunks of capability, but naive linear extrapolation is not especially likely to be correct.

Doug Clow's avatar

I defer to Prof Sir Tim Gowers on this stuff - https://cdn.openai.com/pdf/74c24085-19b0-4534-9c90-465b8e29ad73/unit-distance-remarks.pdf - summary:

"Furthermore, even if it is correct that AI cannot yet find a proof that needs a long hint sequence [=earlier work by other mathematicians, ie it cannot find a proof that isn't 'filling in gaps'/'bringing insights together'], such proofs are very difficult to find for humans as well, so in the unlikely event that progress in AI mathematics does suddenly stall, we have still probably entered an era where it will become very difficult for humans to compete with AI at solving mathematical problems."

Dan Davies's avatar

I think I agree with him but I think it might be showing how different mathematics is from the other areas

Doug Clow's avatar

This is the mathematician's version of that bit in the film I, Robot where Will Smith angrily says to the robot, "Can a robot write a symphony? Can a robot turn a canvas into a beautiful masterpiece?" and the robot parries "Can you?".

Almost no humans are capable of maths at the level we're talking about here. We should take care in assessing AI capabilities when we compare them only to the very best humans: on many tasks they are already better than most humans.

And even if/when they do surpass humans at maths, it may not mean the end of human mathematicians. Almost all of my maths-heavy career has happened after computer algebra systems got better than me at maths in the early 1990s.

Philip Koop's avatar

I think you are right. Many proofs are of the "putting 2 and 2 together" sort, and they are very satisfying when you are the one doing the putting together, and the more obscure, the more satisfying. The body of mathematics has grown so vast that no human can keep track of all the 2's; but a LLM can.

However, I think of Peter Scholze who has said that definitions interest him more than proofs. This is evident in his work (perfectoid spaces, condensed sets) which devise new concepts and new kinds of mathematics that nevertheless can reinterpret existing mathematics in interesting and powerful ways. Is this work in the same category as solving hard problems with existing methods? Maybe, but I do not think so. Can a LLM do something like this in embryonic form that we can then extrapolate from? I don't know of any examples.

Kaleberg's avatar

One big challenge is the combinatoric explosion. This dominates chemistry and biology where the number of possibilities explodes exponentially - literally. The reason that machine learning based protein structure prediction works is because evolution is constrained to build new forms largely from existing forms. Those systems work poorly with novel structures where there is not a big dataset to learn from. They are useful, however, in that we now have a better thumb to use for rules of thumb.

It also dominates mathematics. There was a feeling in the 1950s and 1960s that automatic theorem provers would lead the way to a new era of mechanical mathematics, but any undergraduate who played around with the Boyer-Moore theorem prover, just a toy nowadays, quickly runs into an explosion of possibilities. Take Euclid's five postulates and one quickly gets 5^N possible proofs to explore. As with getting computers to play games like chess, the challenge turned out to be pruning the tree.

Mitch J.'s avatar
1dEdited

People often conflate problems that require intelligence to solve and problems that require information to solve. Mathematicsis built out of systems (sets, numbers, groups, rings) that have their own internal logic, where the necessary information to compose theorems and proofs and even build new methods and frameworks can be figured out from first principles. Most practical, non-academic problems don't work that way; dealing with humans in particular is usually dependent on information acquired from one's own internal experience of being a human.

RobS's avatar

That feels quite a lot like the distinction drawn by Professor Feynmann between what he called 'Babylonian (empirical) mathematics' and 'Greek (modern axiomatic) mathematics', particularly when mathematics is used in physics. There are many (axiomatically) mathematically equivalent representations of (eg the theory of Newtonian gravitation) but they have very different physical interpretations. I think in one presentation he proved that two equations were mathematically equivalent but had very different things to say if one of the masses 'disappeared instantaneously'

gregvp's avatar

Why not? What is different about the humanities and general management?

Dan Davies's avatar

The relationship between different parts of the literature isn't the same; people disagree with each other and knowledge goes out of date in a way that isn't true in mathematics

Bob Maruca's avatar

I tried to experiment with a use case for AI in humanities scholarship, searching for clues to the location of Nabokov’s fictional college town, New Wye in the novel, Pale Fire. Please check out these posts and tell me what you think: https://bobm858524.substack.com/p/where-is-new-wye?r=bi9a&utm_medium=ios and https://bobm858524.substack.com/p/where-is-new-wye-claude-leaps-in?r=bi9a&utm_medium=ios

Paul's avatar

Not all qualified to comment on mathematics. But in these questions of conceptual innovation, I'm always struck by the fact that Darwin (on his own telling) got the answer he needed about the mechanism for natural selection by reading Malthus. On paper, political economy is quite outside biology, so it was not an obvious thing to do. Apparently, he only read Malthus because he was "on a break" from smashing his head against the problem of natural selection. It was not that he suspected there might be a connection.

Hard to know if LLMs could replicate that kind of cross-disciplinary connection. I suspect outside of mathematical applications (or formal symbolic systems) a lot is going to depend on how you phrase the question in natural language. If you suspect there is a connection between ideas, you might be able to write the question in way an LLM could make a connection. But otherwise I think it is going to "understand" natural language concepts the way they are conventionally understood, and unlikely to make the connection if those concepts have not been previously connected. But hard to say for sure.

John Harvey's avatar

This is exactly how you come up with genuinely novel solutions to important problems: get out of your own field, and out of your own head. Didn't that mediocrity Albert Einstein get his ideas from dreaming and sailing, which seemingly had nothing to do with physics or math? That was the point. He got the idea first "out there," then worked out why it was true after.

The aphorism "you need a practical person to solve a difficult problem, and an impractical person to solve an impossible problem" comes to mind.

Maybe the LLM could be trained to look for relevant ideas in other fields? Don't go to the obvious connections, go poking around elsewhere for your patterns or similarities. Maybe that goes against its biases for copying and efficiency, but it ain't efficient if it don't work!

Maybe it needs to become proficient at true nonsense, like jokes. Maybe these Androids are just like sheep and don't dream, period? Maybe a Xerox copier can't make cognitive leaps, or do jazz? Just one plus one equals two?

Human people also have this problem.

The Backseat Policy Critic's avatar

I’m not sure I have a neat way of being able to pithily sum up my thoughts on this one, but here’s a dump of my own reasoning on the matter:

-For a problem to be subject to this, it must be sufficiently abstract that it can be stated in multiple different languages/mental models. The GCI example here worked because it could simultaneously be framed in the traditional geometry approach and also a statistical approach

-Different people have different mental models/languages which they will instinctively attempt to implement (Your post on “if all you have is a hammer” here is excellent). Someone like Richards is naturally going to approach a problem by seeking to transform it into geometric methods, whilst Royne is naturally going to turn it into statistics

-For a problem like this to be solved by this ‘joining up the dots’ method, it requires:

1. A problem that has thus far only been parsed in language A has in fact an answer available when parsed in language B, and 2. Said problem is observed by someone who is naturally inclined to read everything in language B.

-So for an LLM to do this, it must have the wherewithal to take a problem which has only ever been phrased in language A, and attack it using language B.

-LLMs are extrapolation engines, and so when presented with something in language A, when left to their own devices will mine on an industrial level anything that sounds like language A

-Maths is very similar to itself. Even if they operate two different mindsets, language A still operates on similar rules to language B. It is therefore not unreasonable that a LLM may stumble on a proof just by industrialising extrapolation

-More verbal fields are much less delineated. Whilst language A and language B may sound similar, so do a bazillion other languages, and there is a lot more variety of language A to chew through. Expecting an LLM to stumble across the magic solution as opposed to the near infinite dead ends is therefore unlikely.

-An LLM is therefore unlikely to find a solution by itself.

-LLMs seem very good at translating. If you ask an LLM to approach a language originally in language A, and solve it in a language B mindset, it will probably do a reasonably good job of doing so and showing further implications.

-For the reasons given above, an LLM probably won’t do it by itself, thus if it does it, it’s probably because someone asked it to.

-If a problem is sufficiently complex to have not been solved despite substantial effort, and could be solved with language B, it suggests language B is quite complex to learn and not a natural jump from language A

-If someone is thus inclined to ask an LLM to solve the problem in language B, it is probably because they are already fluent in it, and have an inkling that it is already likely to provide a solution

-If they are already fluent and think it’s worth asking the LLM to solve, then chances are they’ve already roughly figured out the answer themselves.

So I guess my rough synthesis would be that I don’t think LLMs are naturally going to solve thorny problems of their own accord, nor is it very likely that simply asking an LLM to solve things without putting in much additional effort will work. What I think would be more likely to work is asking an LLM “what would Stafford Beer think about this entirely unrelated social science problem”, but it seems like there’s every chance that if you’re doing that you probably already had the thought of a rough idea of a solution whilst brushing your teeth.

One thing that could be interesting is collecting a massive set of ‘thorny problems’, and a massive set of differing disciplines, and telling an LLM to have at it. But again, I would have thought chances are most people will only be familiar with one out of the two ends and so likely won’t know what they are looking at, and if they do, then again, they probably didn’t really need the LLM. So whilst I can think of some edge cases, I think the issue of putting two and two together in an original fashion is still largely going to be a human exercise.

That said, I do think LLMs will dramatically increase the ease at which you can do that - it’s one thing being familiar enough with eg both Stafford Beer and the complex issue of adapting British armoured doctrine to the Normandy campaign, but it’s another to be able to have a machine instantly articulate and validate your hunch that one links to the other.

I had a moment like this just earlier today - having been reading extensively on the various trade and finance issues surrounding Britain in the mid 20th century, it struck me that much of Michael Pettis’ work on China and the US seemed extremely applicable there, and upon feeding that to Claude it took that thesis and ran with it, seeming to think it did a better job of articulating the balance of payments problem then most other literature. I’m not skilled enough at economics or knowledgeable enough on the literature to be able to properly assess it, and it may have just been LLMs getting overexcited, but I could certainly imagine that versions of this from people who actually know what they are talking about may well become more common.

Joe Jordan's avatar

Unlike math or organic chemistry, biology is for the most part a branch of literary criticism. The reason for this is that there are too many biological "facts" at too many length scales to have a single theory that covers it all. This means you the scientist have to pick the salient facts and level of analysis. This is not just connecting dots (which is quite literally what organic chemists do), but also picking whick dots to connect. My claim is that such picking requires an intent, which an LLM cannot have. This is also why talk of self-improvement of models strikes me as woolly, because to improve requires a goal, which is not the same thing as an instruction (at least in the common meaning of self-improvement).

Matt Woodward's avatar

I think I'd want to split the domain into "proposing consequential new conjectures" and "resolving existing conjectures", and then the latter into "resolving existing conjectures where doing so requires proposing and resolving other new conjectures", and "resolving existing conjectures where all the mathematical techniques needed to do so are already documented".

That last case is likely to have a big wave of AI-powered successes in the near future, but once that plays out, I think the open question is whether we can train AIs' higher-math reflexes (i.e., the ability to select productive avenues of investigation) to a high enough level that, multiplied by their eventual efficiency in brute-force investigation, they're effective enough to make an impact. Recent and expected-near-future progress on join-the-dots proofs I think can't really give a good sense of which way that's going to go, because it's a different flavour of problem.

Tex Pasley's avatar

I agree, but with the caveat that, in practice, most working mathematicians/scientists/management consultants will not be working with LLMs under the sort of conditions that OpenAI has in place when they want to get a press release out the door. It seems quite possible that a clever researcher with access to handy software (sorta like Claude Science? https://www.anthropic.com/news/claude-science-ai-workbench) could reach an "era-defining breakthrough" with information supplied by the LLM. But odds are that the researcher will still make the important connection while they're standing in the shower.

Jiri Machotka's avatar

In other disciplines it might have a different effect. Rather than producing a "prove", it might serve in order to find demarcation lines and/or commonalities, or articulate perspective from others' position (e.g., via boundary questions).

I think it could be as useful.