Related experience: when I was a law clerk for a US federal judge she gave us a very clear talk about not trying to rephrase settled legal standards in new words to make them sound more interesting, because it can give the appearance of an intentional change or, just as bad, start a process of unnoticed semantic drift.
I'm not sure I agree with the premise here. Central bank boilerplate is not contract boilerplate. One law firm's boilerplate will read differently than another's, and nobody cares much. What is important is that the acceleration clause is in place, there is a choice-of-law clause, a change-of-control clause, etc., etc. The style is rigid within firms because the boilerplate is assembled by junior associates, who can't be trusted with drafting ab initio. (There is also a Chesterton's fence issue--nobody in the firm knows why some clauses are there, but assumes that the earlier drafter knew what s/he was doing. Lawyers can learn a bit from modern computer programming techniques, but internal documentation doesn't fit well with billable hours.)
Well, no one cares about it until the litigators get there. A colleague just won an important motion based in part on the use in one boilerplate provision of "concerning," in another of "concerning the subject matter of," and in yet a third of "related to." Did the drafter (read junior associate hitting copy and paste) carefully consider the use of each, and its meaning relative to and in the context of the others? The law presumes they did!
1. It is definitely true that tokenization assumes that words or even parts of words are the relevant unit of understanding.
2. Modern neural nets could definitely do well on the boilerplate analysis task, and there is certainly enough data for them to work with.
3. However, the bias in modern AI is very much in favor of letting the model learn that it should have the kind of structure you describe, rather than building it in.
4. I bet modern LLMs would be excellent at the boilerplate analysis tasks described here. But they would be very bad at boilerplate generation, both for the reasons described but just for a more basic reason that they aren't nearly as good at precision in large volumes of output as they are at getting exactly the right one word answer.
One of the enduring memories I have of my first year as a trainee lawyer nearly 20 years ago was being told “Elegant Variation is for literature; if you use it in drafting you are letting down your client and this firm.”
A concrete example is “Best/reasonable/ whatever efforts.” These are largely meaningless phrases in contractual drafting* that lawyers use to paper over the fact their principals haven’t really agreed on whether someone has to do something, unless you are careless enough to write “best efforts” in one place and something like “reasonable efforts” somewhere else, in which case you have now made it clear that one standard is higher than the other one.
*yes I know the case law, or at least up to 2015 or so, and yes US/UK/Aus are different and in the commonwealth we normally say endeavours not efforts, but this is still broadly correct
"Boilerplate" is semantically encoded both to communicate and to signify a particular rhetorical stance, shared world viewpoint, or other connotative structure(s). It's often performative, in the sense of Searles's "I now pronounce you man and wife," which is a statement that performs work apart from any communicative value it has. You might even think of it as antlike--the ant's burst of pheromones might indicate "Food this way," but it also connotes "I am of this colony" to the recipient, with all that entails.
Humans are, of course, large language models themselves, and also need significant training for specialized uses. We usually call this "socialization" or "professionalization" or "acclimitization" to the way we do things here at the bank, or government agency, or Death Star. Essentially it's training in how to read boilerplate...but also how to recognize something that belongs to the nest, i.e. the "right" boilerplate. This process isn't perfect for humans, either, and not all of them are equally competent in detecting nest-pheromones; cue the excruciating three-hour sessions I used to sit through drafting a two-paragraph memo with a group of thirty people at Social Security.
There’s probably also a load of work generation here, insofar as occasionally these kinds of institutions have to take a set of “big moves” that require a whole bunch of human capital inc analysis, engagement, reporting writing etc. So that when, for the rest of the time, a company or organisation is merely maintaining steady state, all those people on stand by for the “big moves” need to invent work for themselves by turning little words into Important Things.
Related experience: when I was a law clerk for a US federal judge she gave us a very clear talk about not trying to rephrase settled legal standards in new words to make them sound more interesting, because it can give the appearance of an intentional change or, just as bad, start a process of unnoticed semantic drift.
I'm not sure I agree with the premise here. Central bank boilerplate is not contract boilerplate. One law firm's boilerplate will read differently than another's, and nobody cares much. What is important is that the acceleration clause is in place, there is a choice-of-law clause, a change-of-control clause, etc., etc. The style is rigid within firms because the boilerplate is assembled by junior associates, who can't be trusted with drafting ab initio. (There is also a Chesterton's fence issue--nobody in the firm knows why some clauses are there, but assumes that the earlier drafter knew what s/he was doing. Lawyers can learn a bit from modern computer programming techniques, but internal documentation doesn't fit well with billable hours.)
Well, no one cares about it until the litigators get there. A colleague just won an important motion based in part on the use in one boilerplate provision of "concerning," in another of "concerning the subject matter of," and in yet a third of "related to." Did the drafter (read junior associate hitting copy and paste) carefully consider the use of each, and its meaning relative to and in the context of the others? The law presumes they did!
Oh, dear: the First Law of Drafting. No style points for elegant variation!
1. It is definitely true that tokenization assumes that words or even parts of words are the relevant unit of understanding.
2. Modern neural nets could definitely do well on the boilerplate analysis task, and there is certainly enough data for them to work with.
3. However, the bias in modern AI is very much in favor of letting the model learn that it should have the kind of structure you describe, rather than building it in.
4. I bet modern LLMs would be excellent at the boilerplate analysis tasks described here. But they would be very bad at boilerplate generation, both for the reasons described but just for a more basic reason that they aren't nearly as good at precision in large volumes of output as they are at getting exactly the right one word answer.
One of the enduring memories I have of my first year as a trainee lawyer nearly 20 years ago was being told “Elegant Variation is for literature; if you use it in drafting you are letting down your client and this firm.”
A concrete example is “Best/reasonable/ whatever efforts.” These are largely meaningless phrases in contractual drafting* that lawyers use to paper over the fact their principals haven’t really agreed on whether someone has to do something, unless you are careless enough to write “best efforts” in one place and something like “reasonable efforts” somewhere else, in which case you have now made it clear that one standard is higher than the other one.
*yes I know the case law, or at least up to 2015 or so, and yes US/UK/Aus are different and in the commonwealth we normally say endeavours not efforts, but this is still broadly correct
"Boilerplate" is semantically encoded both to communicate and to signify a particular rhetorical stance, shared world viewpoint, or other connotative structure(s). It's often performative, in the sense of Searles's "I now pronounce you man and wife," which is a statement that performs work apart from any communicative value it has. You might even think of it as antlike--the ant's burst of pheromones might indicate "Food this way," but it also connotes "I am of this colony" to the recipient, with all that entails.
Humans are, of course, large language models themselves, and also need significant training for specialized uses. We usually call this "socialization" or "professionalization" or "acclimitization" to the way we do things here at the bank, or government agency, or Death Star. Essentially it's training in how to read boilerplate...but also how to recognize something that belongs to the nest, i.e. the "right" boilerplate. This process isn't perfect for humans, either, and not all of them are equally competent in detecting nest-pheromones; cue the excruciating three-hour sessions I used to sit through drafting a two-paragraph memo with a group of thirty people at Social Security.
This rough point is pretty much the basis of my argument that we are about to see a dot com style AI crash. https://backseatpolicycritic.substack.com/p/ai-is-a-scam
There’s probably also a load of work generation here, insofar as occasionally these kinds of institutions have to take a set of “big moves” that require a whole bunch of human capital inc analysis, engagement, reporting writing etc. So that when, for the rest of the time, a company or organisation is merely maintaining steady state, all those people on stand by for the “big moves” need to invent work for themselves by turning little words into Important Things.