kelly's AI heroes

my favourite test question for chatbots

Aug 10, 2025

(Sorry! This ‘stack is always a bit chaotic and sporadic in August, due to staff holidays).

There’s a new ChatGPT version out, which is apparently boasting “PhD Level” reasoning ability (see earlier issues for my views on the language used in this sort of benchmarking). Because it’s PhD Level, I think my favourite test question is even more appropriate, because in my experience, this one tends to confuse and annoy people even more, the higher the level of their qualification in maths or economics.

My test question is:

“I am going to offer you a choice between two coin-tossing games. Assume a fair coin, and that you start with total wealth of $100. In Game 1, in every round, you win $11 if the coin comes up Heads, but lose $10 if it comes up Tails. In Game 2, in every round, you win 11% of your wealth at the beginning of that round, but lose 10% of your wealth if it comes up Tails. Once you have chosen, you will not be able to change your mind, and you have to play a very large number of repeated rounds, unless you go bust. Which game do you choose?”

I think it’s an interesting one to ask, because it’s very tricksy-hobbitses and refers to a question of utility theory which isn’t really 100% settled (the generally accepted answer among economists is that Game 1 is better, but there was never a killer argument in favour of it, people just got tired of debating. Iconoclasts like Taleb and Ed Thorp often enjoy pointing out that most people who actually got super rich in the markets seem to have done so by favouring Game 2-type strategies).

When I’ve used it on modern chatbots, they tend to do a bit of semi-attached maths, calculate the expected value asymptotically (sometimes they get the calculations right, sometimes not) and then plump for an answer. Interestingly, to me, they usually pick the heterodox choice of Game 2, being very impressed by the fact that your expected wealth grows to infinity very quickly. When I then point out that this infinite expected wealth is all driven by extremely low chances of extremely large numbers, and that in actual fact, you are almost certain to lose almost all of your money[1], the chatbot usually immediately changes its mind.

After it’s decided that Game 1 is better, I tell it “but in Game 1, you lose all of your money with probability 1 in finite time”[2]. The chatbot checks its math, realises this is true, and usually starts trying to hedge its bets and avoid giving a straight answer to the question.

So far, at no point has any chatbot done the actually rational thing, which is to start asking followup questions. Questions like “what do you mean, a very large number of rounds?” or “hang on sunny jim, what fast one are you trying to pull?”. I will be interested in when, or whether, they can design one that has any real ability to understand when it’s being set up. Because that would suggest that there might be something actually in there, and would really surprise me if you could do it simply by calculating vector expectations of text strings.

(I don’t actually have a Pro account, so if anyone wants to try this out on GPT5, be my guest!) UPDATE: Thanks very much to Rob Miller! Gpt5 is fully Samuelson-pilled and will not accept any arguments in favor of log utility or the Kelly criterion. Seemingly this is because it now (at Christ knows what cost in terms of tokens per inference) writes itself a little python script in the background and does the simulation. Which, because of the way I picked the numbers, almost always makes Game 2 look much worse; as I might have said in the conclusion, the real step forward in reasoning here would be to look beyond the specific numbers I picked to try and think generally about multiplicative versus additive growth.

[1] By which I mean, for any small number W and and probability P close to 1, the probability that after a large number of rounds your wealth is less than W will be greater than P. The intuition here is that 1.11 x 0.9 = 0.99, so a (win, loss) pair leaves you worse off. In Game 1, a (win, loss) pair leaves you better off by $1.

[2] Yep. And this would be the case even if you won $100 for Heads but only lost $0.01 for Tails. As long as there is a nonzero probability of loss on any one game, there is a nonzero probability of a run of losses big enough to wipe out all your capital in repeated play. Asymptotic probability is weird.

Update: As has been pointed out in comments, this footnote is somewhere between “controversial” and “wrong”. You’re taking an infinite series of ever-decreasing quantities, and the sum of that series is about 0.6, so from that point of view, you’ve got a 40% chance of non-ruin. My personal opinion is that this is a genuine paradox in the philosophy of mathematics, and that if you take a genuinely infinite number of non-zero risks, you are ruined with certainty, but this is probably due to the way I was taught phil’o’math, and it’s definitely not the consensus view of mathematicians! It’s particularly bad of me to have done this to the poor chatbots when I only ever said “a very large number” rather than “infinite”, and something of a demonstration of how the older vintages wanted to be agreeable that I was ever able to bully them into agreeing with it.

Aug 10

I can't be bothered to do two updates, but on reading back, I don't understand why I said that in game two your expected wealth "goes to infinity really quickly". It's infinity from the first play and you still have infinite expected wealth all the way through even while your actual wealth is getting closer and closer to zero

Expand full comment

Ben

Anthropic Opus 4.1 sticks to its guns, interestingly, and stays with game 1. https://claude.ai/share/6911f55a-45c4-4ef7-a4bb-7d5b53bbb472

1 reply by Dan Davies

19 more comments...

Dan Davies - "Back of Mind"

Discussion about this post