14 Comments

Not to get all Freudian but calling the consultation AI 'Humphrey' rather than say 'Bernard' hints at a mindset of subtle obstructionism rather than actually trying to help

Expand full comment

hahaha yeah, it was put together by people who thought Sir H was the hero of that show

Expand full comment

I was a pointy-headed bureaucrat: USAian. As far as vibe-checking went, we didn't need formal consultations. We knew which wheels were likely to squeak, and talked to them in advance of everything. They were also useful in preparing snag lists. (I wasn't an environmental bureaucrat, so I didn't have to worry about NIMBY sandbags: somebody producing a snail darter or spotted owl at the last moment.) The formal process was best for venting: especially public hearings. Comments were also useful for venting, because interested parties had constituents, who needed to know that their groups were on the ball.

Expand full comment

"how might we redesign the whole process to take advantage of this new technology?"

I wonder why regulators don't use forum-like tools to get feedback more often... it would allow individuals to interact and vote on arguments - and so a regulator could filter what is an original and valuable contribution. But agencies seldom collect and analyze feedback on new directives even from their own employees...

Maybe they're afraid everything would develop into something like Twitter - which is a reasonable . And deploying a "forum-like tool" doesn't sound as fancy as "we developed a new AI assistant" etc.

Expand full comment

My experience from the receiving end could be summarised, to use Mr Davis’s term, as “forums: good for snag-finding, awful for vibe-checking”, because the opinions of “people who want to spend a chunk of their spare time arguing about <thing>” are typically not at all representative of the opinions of “users of <thing>”.

Expand full comment

(Also, for snag-finding, voting is ~useless for the same reason that LLMs are - you need people with sufficiently good judgement to read *everything* if you want it to be effective.)

Expand full comment

Interesting as usual. Maybe I am reading too much into this but I have the impression that our blogger depicts project decisions as the result of well meaning wykehamists, oops I mean centrists, oops again I mean technocrats trying to figure out that is best for everybody and then they need to get vibes or list snags to see if those are achievable. In the same way I have the impression that he thinks that high infrastructure costs "happen", and that unaccountability "emerged" (and I think that many reckon that the house price and rent boom and wage deflation have been similarly "spontaneous") and we need technical fixes to all of these issues.

My long standing impression is instead that regardless of whether these issues were emergent, they reflect well the interests of those who matter and fixing them needs the achieving of strong political power by those opposed to them, else proposing technical fixes is fatuous even when they are interesting to discuss.

Expand full comment

Yeah cos vibe-check is stats: 1. Roughly group responses based on synonymy (as good statistical interchangeability in the LLM data set; 2. Count them..

But snag identification basically needs real (good) judgement about the real world; and 'what was on the Anglo internet 5 years ago' is not going to produce that however you slice it.

Also law, which LLMs are also specifically realy bad at (inventing cases, flipping valence of judgements), perhaps because those too involve human sensibilities, judgement, analogy, knowledge of precedence etc. while to the LLM they sound like 'wah wah, reasonable, wah wah Carruthers v East Acton Corporation, not without merit wah wah, and all arrangements of different names are about equally probable, as perhaps in many contexts are 'unlawful' / 'lawful', etc

Expand full comment

Lots of food for thought there. It's interesting to consider how a few people raised the issue of flooding re: the Manchester Airport link road during the planning stages, which was then built and floods not quite every year, but often enough that I think one can make the case that snagging needs to be taken more seriously than it often is.

As to the closing question: how is the £80m actually split up? Market researchers do vibe checks for companies and it doesn't seem to be a high margin gravy train, so I wonder if the (plausible) difficulty you point to in using the system for the snags side means the savings aren't that large.

Expand full comment

£80m is the whole cost of policy consultations - they do about 80 of them a year, and typically the external spend is 100 grand. I can sort of buy that, because it seems like a much bigger exercise then all but the biggest market research clients would ever commission. the document says (and I have no reason to gainsay) that they regularly get upwards of 30,000 responses to the kind of consultation they are talking about.

If you want that turned round in less than a couple of months, you are going to need a dozen staff or so, so at a reasonable per day rate allowing for profit and writing up time the cost gets there. (I have been told on social media that the kind of temp graduate employees who gets hired for sort of job typically have absolutely no background in the kind of qualitative analysis you might hope for)

Expand full comment

The interesting thing about the Airport Relief Road and the earlier, subject to the same problem but on a lesser scale, Alderley Edge bypass is why the engineers didn’t see that burrowing down under an existing road/railway line would create a dip in which water would gather.

Expand full comment

This vibe check/snag list concept makes plenty of sense if you think about the fundamental ways that modern language models work. It's on statistical probabilities of word frequency, not the meaning of sentences. So when you get dozens, hundreds, or thousands of letters from citizens, there are enough words in that pile that the frequency and usage of words is enough to reach a reasonable conclusion about sentiment. Thus, "vibe check" is a pretty reasonable use of software that works that way.

On the other hand, if the thing that you need to identify is relatively rare in the corpus of words, (e.g., just a few people point out the tight bend in the road), the AI is likely to assign it low significance because of its low frequency. It will therefore remove it or overlook it in the summary of what was important.

Finally, "model collapse" is the end result of this process. A citizen says "Gee, I'm not a very eloquent writer, but I want to write a compelling letter to the government about my opinion. Let me ask a large language model to take my 2-3 phrases and turn them into 2-3 paragraphs of eloquent prose." Then they submit AI-generated output as input to an AI. And since LLMs produce massive volumes of text with minimal effort, the first few consultations that go this way will receive heaps of AI-generated slop as supposed citizen-generated writing. And bad actors will leverage this to send piles of AI-generated slop to push their agenda. The bad actors' output will be difficult to distinguish from genuine citizens because lots of people will be using LLMs to write their text.

Expand full comment

I want you, Guarino, Pahlka and maybe Patio11 always to discuss together first, since they are the ones best equipped to weigh in.

Expand full comment

How much would the AI systems to carry out this analysis cost ? Might their use require either lots of new "AI consultants", or

lots of legal advice about what to do in the unlikely event that "the machine has f ***ed up" ?

Expand full comment