Apparently Nate Silver has a new book coming out, in which (if the reviews I’ve read are representative) he advances a theory of political philosophy based on his experiences in gambling. It set me thinking, about the interesting negative template that’s always there when Data Guys get into gambling. There’s always a lot of poker, maybe a little bit of backgammon – card counting at casinos maybe. You have your Moneyball and its equivalents in sports betting. But it surprises me that you very rarely see Data Science Skills applied to horse racing, even though that’s a very big and liquid gambling market where Big Data is quite easily available.
Here's an exception that proves the rule – Bill Benter. He used computer analysis to beat the bookies at the Hong Kong Jockey Club and took them for hundreds of millions of dollars (ie, peanuts to the Hong Kong Jockey Club).
And the reason that this is an exception which proves the rule rather than falsifying it is that this could only be done in Hong Kong. Partly because of the way betting works there (it’s a peri-mutual system for those that care; nothing is really gained by knowing what that means other than that it makes it a bit easier to place large bets). But mainly because Hong Kong is unique in having only two racetracks, one of which is much more important than the other. It’s also too far away from other racing markets to regularly move horses there. So it’s pretty much the only place in the word where the same horses race each other, on the same course, very frequently indeed.
This matters, because “horses for courses” isn’t just a fun rhyming thing to say – it’s absolutely fundamental to horse race gambling. Some horses are good at shorter distances, some are stayers. Some of them have the brains to adjust their pace on a hill, others will try to keep sprinting and exhaust themselves. Some of them have a gait that suits muddy conditions, others run better on hard ground. Et cetera …
Racecourses are very different. In the UK, for example, they aren’t even consistent as to whether the horses run clockwise or anticlockwise. The course at Chester is almost circular, which means that racing there is a test of how good a horse is on its front left foot. Perth has a steep slope up to the finish which always catches out front-runners. A sharp bend (like the one at Catterick) will confuse and unsettle a horse that hasn’t seen it before, but not one that’s expecting it. Et cetera …
All of which means that horse racing data has an absolutely massive curse of dimensionality. If you’re trying to estimate a model to use historical race data to tell you which horse out of a field of eight or nine is likely to show up fastest, then you’d have to, at the very least, take into account the course, distance, “going” (ground conditions) and class of each race. Since there are also interaction effects (different courses will suit different horses depending on whether it’s muddy or dry, for example), then you’re eating up degrees of freedom very quickly.
Added to which, there’s a curse of non-stationarity. You want lots of data points to deal with your curse of dimensionality, but very few horses have more than a dozen starts in a season. So the more data you collect, the older some of it is. And a lot can change for a horse in a year – injuries, changes of trainer and the simple effect of maturity and aging.
This is why bookmakers smile when they meet a quant with a system, particularly if they hear that their new customer makes a lot of money playing poker. Statistical analysis of the British racetrack is basically impossible…. Or is it?
The data issues I’ve outlined make it more or less impossible to get the kind of model that Moneyballers and poker players want, which gives you a single statistically optimal prediction. But that doesn’t mean the formbook is useless. There are all sorts of rules of thumb and statistical regularities that can be established. (For example “The sharp turn at Catterick will always catch out a horse that hasn’t raced there before”, as I said a couple of paragraphs ago).
Really good racing analysts can systematically beat the odds by knowing a lot of these little rules of thumb, being really familiar with the form and having enough experience to know which statistical regularities are most salient for any given race. They are also good at spotting which races are easiest to analyse and most likely to offer attractive odds, and good at not betting on the other ones. (I’ll talk about my own system on Friday if you like!).
I think most of the world is more like horse racing than it is like poker. (I notice that when the Data Guys start trying to make money in finance, they tend to gravitate toward derivatives and high-frequency trading. “Horses for courses” is a phrase you will often hear in the stock market, because companies and industries have even more variety than racecourses do). That’s why most cybernetic schemes to improve the human condition tend to fail.
And I think that’s my considered response to a lot of the comments on my James C Scott posts. To a large extent, the debate about techne and metis, technical quantifiable knowledge versus embedded and tacit skill, is really about the curse of dimensionality. There’s a point at which, to quote Ben Recht, “math becomes metaphor” and the skill of prediction and management is one of knowing which parts of the mental model to apply; while you learn these things in mathematical terms, the algorithm doesn’t help you any more.
You might really enjoy the section in McGilchrist's The Matter with Things about Franck Mourier- a horse trainer turned bookie who beat the professional oddsmakers. Yet he had no idea how to codify his expert intuition and any attempt to do so degraded the signal.
Agree with you. Most of the important things in the world are not susceptible to statistics, and most of the errors humans make are not statistical ones.
BTW :-) it's "parimutuel" https://en.wikipedia.org/wiki/Parimutuel_betting not "peri-mutual" unless you mean the portuguese version they used in Portugals' African colonies: https://www.thespicehouse.com/products/peri-peri-mozambique-blend?srsltid=AfmBOopTwOkvGXuCNI_ZXyRtPRvXxgO1OJzk9JGwe5DveZVLjsIWLL-0