There are some occasions when someone says something that clearly seems totally normal to them, but which give you a clue that you might have got yourself into something weird. Once upon a time, in the brief “golden boy” phase of my career at the Bank of England (which preceded the “undignified swan dive” phase), I was adjacent to the drafting of a piece of central bank communication. During the process I had one of those “ello ello, something is up here” moments.
A colleague uttered the sentence (entirely earnestly):
“well, three months ago we said that we were ‘actively considering’, so if we just write now that we are ‘considering’, that will definitely be taken as significant”.
No joke of a lie. They had even brought along the report from three months earlier to show that they were right. The scary thing is that this isn’t even really all that ludicrous; on the other side of the fence, I have certainly seen monetary policy statements parsed at the level of individual words or even commas.
What’s stuck with me is that part of the function of boilerplate is to be boilerplate. There are some places in text where you need to communicate the additional, meta-textual information that you are not, at this point, trying to have an original thought and that while this paragraph needs to be carefully read once to confirm that it is boilerplate, it can thereafter be skimmed. This is true in both positive and negative aspects; there are many places in my professional writing where I want to refer to the “total loss-absorbing capital” of a financial institution, but where I would never use that phrase as it has a particular regulatory meaning and it would slow my readers down a lot to have to stop and worry whether I’m talking about that or generically.
I think this is part of the reason why LLMs seem to do so badly with legal texts. One of the tricks used to make transformer neural networks produce more human-sounding output is to introduce a bit of randomness (the “temperature” parameter), so that they select one of the closest few “near neighbour” tokens rather than always glomming onto the single smallest vector distance. Consideration of the function of boilerplate immediately shows why that’s problematic – as well as wasting everyone’s time trying to work out whether a minor change in verbal expression is significant, there’s a constant danger of creating something which actually does have a different effect than the boilerplate and changing contract terms by accident.
Turning down the temperature parameter to zero (in fact, a number very close to zero, to avoid division problems) is not necessarily going to help either – if you do this, then (some part of) the network can only produce one output per input, so everything will depend on whether the solution to your legal problem happens to be one that’s located exactly on a particular hyperplane through the token space, and whether you’re able to find that hyperplane with exactly the right prompt. The temperature parameter is a large part of what makes the “go back and improve that” technique work.
As far as I can tell, from my limited reading in the area, the issue is in the tokens themselves – the way that the dataset is coded and embedded. Boilerplate passages need to be treated as single, big, semantic units, but they are often made up of components which also need to be coded as tokens. (Boilerplate is very often made out of boilerplate – you put together a standard paragraph by choosing the standard set of standard sentences).
This sort of learning the hidden structure in a large body of text looks like the sort of problem that transformer networks ought to be good at. Knowing which units of text are followed by which other units, and paying attention to higher-level structures that tell you what context you’re operating in is how LLMs produce recognisable and relevant answers. But my experience has certainly been that they have problems with boilerplate.
I think the reason is something I alluded to above – on first reading, a boilerplate paragraph needs to be read in detail, to check that it is what it appears to be, but thereafter it can be skimmed. So not only does the relevant token size change, it changes depending on what the paragraph means. A neutral network of any kind can’t do what humans do, which is to change their approach to syntax depending on the semantics, because the network can only recognise a semantic change if it’s associated with a different structure.
All of which suggests to me that this isn’t an intrinsically impossible task, but that it’s difficult for AI models as they are currently implemented, and that in order to address it you would need a particular kind of dataset which has sufficient volume of boilerplate, boilerplate-adjacent text with significant differences, and material demonstrating the difference between the two.And human beings don’t seem to need this, which might suggest another way in which human thought is qualitatively different from this kind of algorithm.I used to laugh at people sweating over commas in press releases, or saying “I wonder what he meant by saying broadly flat rather than largely unchanged?”.But maybe they were engaged in a much higher human function than I had realised at the time.
Related experience: when I was a law clerk for a US federal judge she gave us a very clear talk about not trying to rephrase settled legal standards in new words to make them sound more interesting, because it can give the appearance of an intentional change or, just as bad, start a process of unnoticed semantic drift.
I'm not sure I agree with the premise here. Central bank boilerplate is not contract boilerplate. One law firm's boilerplate will read differently than another's, and nobody cares much. What is important is that the acceleration clause is in place, there is a choice-of-law clause, a change-of-control clause, etc., etc. The style is rigid within firms because the boilerplate is assembled by junior associates, who can't be trusted with drafting ab initio. (There is also a Chesterton's fence issue--nobody in the firm knows why some clauses are there, but assumes that the earlier drafter knew what s/he was doing. Lawyers can learn a bit from modern computer programming techniques, but internal documentation doesn't fit well with billable hours.)