history is the bygone which won't stay bygone
the past, in its capacity as an unobservable current state
One of the many silly little bees that buzz round my bonnet is the following argument against “the lessons of history”. I can’t remember where I got it from, or whether I made it up, but here’s the idea:
1. In order to learn the lessons from history, you have to correctly choose which historical example is relevant to the current situation.
2. You then have to correctly generalise from the historical example to extract the underlying principles which form the lesson from history.
3. And then you have to correctly apply those principles to the current situation.
4. All of which seems much more difficult than just knuckling down and solving your own problem, without messing around hoping that reading stories about the past will help.
I kind-of sort-of believe this. Certainly, at some point in the future, I am going to expand on my “Northern Granma Theory of History”, which could be summarised by saying that most of the history of Europe should just be left to lie, because it only causes arguments.
However! Let’s consider this extract from a description of an artificial intelligence chatbot:
“This dependence of the probability on what came earlier is a marked characteristic of the sequences of letters given by a language such as English. Thus: what is the probability that an s will be followed by a t? It depends much on what preceded the s; thus es followed by t is common, but ds followed by t is rare. Were the letters a Markov Chain, then s would be followed by t with the same frequency in the two cases.
These dependencies are characteristic in language, which contains many of them. They range from the simple linkages of the type just mentioned to the long range linkages that make the ending “… of Kantian transcendentalism” more probable in a book that starts “The university of the eighteenth century …” than in one that starts “The modern racehorse …”.
The interesting thing about this extract is that it isn’t from an article about OpenAI – it’s taken from page 170-171 of “An Introduction to Cybernetics”, written by the psychiatrist and mathematician W Ross Ashby in 1956 (by way of scale, this was the year in which an IBM 650 become the first non-domestic computer imported to the UK). I’ve taken it not (just) to show that the dream of AI has been around for a while, but because this extract comes in the context of a discussion of Markov chains and control systems which I think implies an interesting philosophy of history.
A ”Markov chain” is a sort of generalisation of the “random walk” – it can deal with situations where the next step is not wholly random. The crucial property it preserves from the random walk, though, is that of having zero memory. The “Markov” criterion for processes is that their next state is only dependent on their current state; there’s no “path dependence” on how the process got there.
What Ashby is explaining in this section about text analysis, though, is that you can often “recode” a non-Markov process to have this property, by including some of the past path in the current state. If a process goes from state C to either state X or state Y depending on how it got there, you might still be able to describe it as a Markov chain by saying that (A,C) is one state and (B,C) is a different state, and so on, just as the probability of s being followed by t depends on whether the previous letter was d or e. This obviously gives you a much bigger “transition matrix” of states for the system to be in, but the big matrix still has the Markov property. In many cases that means that is considerably easier to deal with mathematically than a smaller matrix which doesn’t.
More interestingly, you can sometimes apply this recoding trick in reverse. If a system has unobservable states, then you might be able to nevertheless improve your prediction of its next move by looking at how it got there. You might be able to infer something to the effect that (A,C) corresponds to a hidden parameter set to 1 and (B,C) to a hidden parameter set to 0. Ashby’s explanation might be clearer:
“Thus, suppose I am in a friend’s house and, as a car goes past, his dog rushes to a corner of the room and cringes. To me the behaviour is causeless and inexplicable. Then my friend says ‘He was run over by a car six months ago’. […] Memory is not an objective something that a system either does or does not possess; it is a concept that the observer invokes to fill in the gap caused when part of the system is unobservable. The fewer the observable variables, the more will the observer be forced to regard events of the past as playing a part in the system’s behaviour”.
“History” is that part of the past which might be considered to describe an unobservable state of the system in the present. The rest is just stories.