The first edition of the Grenfell book (which I learned about here) was one of the best I've ever read on the topic of learning from incidents and systemic change; I think about the quote from her mentor about knocking on closed doors and looking for open ones all the time. Thanks for recommending it and I'm excited for the new edition!
Indeed, sounds a much needed book. Sadly Grenfell wasn't a surprise to anyone familiar with the myriad systemic problems within the building industry and the cowboys (that's being kind) who game the system. Will lessons be learn't? I doubt it.
"So you’re left with a model that has to be trained on the corpus of human decisions, and which will therefore be likely to reproduce human mistakes."
The conclusion you draw from this is that there would be no point to such a machine. But that seems under-supported. If the machine which reproduces human mistakes is cheaper to run than the human worker, why wouldn't a business prefer to make that switch?
(I suspect that the machine would in fact have a different distribution of mistakes than a human, and perhaps those mistakes would be so much more expensive than human mistakes [because, e.g., they seem so bizarre that nothing's set up to contain their effects] that even a very cheap machine would be a false economy. But I do think you need a steady like that in the argument.)
in fact, and this probably needs a full separate post, it produces "the decisions that a human being would make if they were restricted to data which was current as of the last training date".
I think the dream of doing this sort of thing at a industrial scale depends very much on the cost of inference being comparatively low, but this is a mirage, like the mirage of AI managers being more efficient because they are creatures of pure logic.
If you put a LLM in the place of a middle manager, then as far as I can see you are on the hook for ongoing training cost and updating of the training dataset, because you need it to replicate someone who is (insert favourite joke about whatever prick in your office this isn't true of) constantly learning and changing their views in line with experience.
The other interesting possibility of course is to redefine and re-engineer the entire information workflow around LLM decision making, but that is an absolutely huge project which might take a few more cups of coffee for me to think about.
I'm not going to pretend to expertise on the costs here, but doing this on the literal back of an envelope in a waiting room: say 10^9 dollars to train a model from scratch. Say a useful life of 5 years. Say it gets used in parallel for 2x10^4 positions . Then the fixed costs are recouped at 10^4 dollars per year per position. Now we do need to add on run-time costs ("inference") and (as you say) retraining. But retraining / updating is vastly cheaper than initial training and development. Even if those operating costs add up to 4x10^4 dollars per position per year, total 5x10^4, we'd still be looking at something rather cheaper than the median starting salary from the undergraduate program I teach in.
(The points to undermine the arithmetic would be shortening the lifecycle of the model and/or cutting the number of positions it gets used in in parallel.)
Yes I think the big question is how many jobs you could replace (particularly since this is an adversarial context, as I flagged but didn't have the space or brains to go into). A lot of it kind of boils down to the question "would you be happy replacing five people with one here, even if you knew that one person was capable of doing the work of the other four". To which the answer is "probably yes if I could guarantee there would be no edge or corner cases, but in the real world you are taking a hell of a risk by losing viewpoint diversity"
Also thinking about it there are really not many companies who employ 20,000 middle managers in the first place, so that level of headcount amortization depends on being willing to share with potential competitors, which could have some really interesting game theory implications
The labs are working on "continual learning" and there have been no loud claims of impossibility as yet, so the human approach of general training supplied by the model vendor/business school, and per-firm on-the-job training, seems feasible.
For, as Cosma says, lower cost than the equivalent humans.
Such an approach would probably double the working life of llms too.
Viewpoint diversity could be retained by using models from 3 vendors with majority voting. Still cheaper than one human per position in all likelihood.
With cognitive capacity separated from a physical body, there is now room for experimentation in org chart shapes.
I suppose we could also say that the true cost of initial model development and training should be much more than 10^9, if the developers have to actually pay for training data.
It's too trite to be universally true, but when I'm feeling lazy I assert that all hard problems are matters of judgement rather than of following the correct rules.
I think you can construct classes of things that can reasonably be described as "hard problems" that are at least arguably not judgement issues, e.g. deep math. For everyday stuff though, it seems like a decent heuristic, at least as a way to push back on people trying to control work they don't fully understand by insisting on proceduralising it.
"Unless you can hire an army of offshore workers to tag management decisions as “good” and “bad”, I suppose. But that sounds expensive."
Use business school materials (Management, Strategy, maybe Org. Behavior) to train Train an LLM or something more general, and see how it does. The corpus of such materials is a lot less than the whole web. This doesn't sound too expensive. Tell it to focus on what was known or could have been known when the decision was made, but to judge based on results ex post. Doesn't sound expensive relative to the potential benefits.
Nice idea but I think business school case study materials are very much designed to convey a specific point; training the model this way will lead it to over index on simple lessons and miss the complexity of the real situation that it is thrown into. (Again, what I am doing here is reproducing a very common critique of human brains which have been trained on business school case studies!)
(at last we have reproduced the MBA graduate who knows it all, from the famous Harvard Business School case study "Do not create the MBA graduate who knows it all" ;-))
First, I think the question of whether one can replace middle management with AI is one which we are testing empirically, so I am loathe to leave hostages to fortune by predicting how it will turn out. My intuition is that if AI can do that, it can definitely do top management even better, and a whole load of other things besides, and the question will become pretty uninteresting.
Second, a lot of the discourse on this is missing a key component. While I do greatly value the systems/Stafford Beer view, it's only a model, and in reality middle management isn't only or even mostly an information processing job, it's a human relations job. You need to persuade senior managers and the staff being managed. Most good managers do this through building strong human relationships. Even if we spot AI the ability to do this in the abstract, arguendo, my hunch is that many people will resist being managed by an AI, or at minimum will react very differently to an AI middle manager asking or telling them to do something.
I'm struck by the parallel with assessment in education. Computers have been better than this on technical metrics since long before AI and LLMs. The dark secret is how terrible humans are by comparison. (The penny-drop moment for me was a study on reliability where experienced human markers gave different grades to the exact same essay inserted at different points in a big stack of marking.)
The one thing humans can do easily and computers struggle with is convincing people that another human being cares about their work.
Middle management is distinctly not the same thing as running the whole business. And it's not at all obvious that management is anti-inductive the way that markets are (I could be persuaded that it is, but you would have to actually make the case).
Paul Watslawick said that the relationship steers the matter not the other way around. That understanding seems appropriate here. I am testing a LLM accordingly
Training the model to replicate human decisions based on the inputs they have at a given moment sounds like the Reinforcement Learning approach, which can be quite powerful.
The difficulty in using it for management performance optimisation feels similar to media mix modelling in marketing analysis somehow.
Very interesting. Quick question: if you were in charge of picking a cricket team would you develop an LLM , use another computer system or stick with the meat sack option?
Cricket I think is very much on the borderline between LLM and human (I'm assuming we could use the LLM with video input for training and input). Baseball already has that film with Brad Pitt made about picking teams with considerably less sophisticated algorithms so I guess it's on the other border of LLM.
At least in baseball, I think you would have to use both. There are a number of social factors in picking a team. For example, do you promote a rising young player to a secondary role, or retain a slightly-less-talented older player who will not chafe in such a role? (In baseball, the younger player usually has an indenture, and can be retained in the minor league for a few years.) This will require a fair knowledge of the individuals' psychology.
the market is extremely competitive, resulting in an industry structure where a small number of people outperform the market, like Warren Buffet or Bill Ackman or whoever you think is good, and the rest average out to the market, or below when transaction costs are figured in. Which is why index funds are popular. for LLMs to be good stock pickers, they would have to outcompete not only all the mediocre people, but also the Warren Buffet's and Bill Ackman's. That would be hard to do if the LLMs are trained on generic public info, rather than on the proprietary knowledge that these people have developed over decades and that they aggregate through their organizations and connections. Furthermore, to be good stock pickers, LLMs would have to outcompete other LLMs, meaning probably thousands and millions of LLMs. What seems more likely to me is that the biggest/best investment companies will develop specialized LLM tools to aid them in their stock picking, further widening the gulf between the best and the rest.
Good article Dan. The underlying issue is that these and many similar companies significantly over-hired a few years ago. Likely for many reasons (i.e. post Covid, short term needs) but clearly a lack of a mid-to-long term workforce strategy was underpinning their hiring approach. Cut to a few years later and they can now absolve themselves of this by pointing to their successful AI as being the reason for the job cuts.
Yeah, there are plenty of people working in middle management jobs which could reasonably be summarized as "guy who copies numbers from emails into a spreadsheet then reads the results out at a meeting" but if you hired a couple of hundred of them then that is kind of on you, rather than a vision of the future.
The first edition of the Grenfell book (which I learned about here) was one of the best I've ever read on the topic of learning from incidents and systemic change; I think about the quote from her mentor about knocking on closed doors and looking for open ones all the time. Thanks for recommending it and I'm excited for the new edition!
Indeed, sounds a much needed book. Sadly Grenfell wasn't a surprise to anyone familiar with the myriad systemic problems within the building industry and the cowboys (that's being kind) who game the system. Will lessons be learn't? I doubt it.
So, I agree up to here:
"So you’re left with a model that has to be trained on the corpus of human decisions, and which will therefore be likely to reproduce human mistakes."
The conclusion you draw from this is that there would be no point to such a machine. But that seems under-supported. If the machine which reproduces human mistakes is cheaper to run than the human worker, why wouldn't a business prefer to make that switch?
(I suspect that the machine would in fact have a different distribution of mistakes than a human, and perhaps those mistakes would be so much more expensive than human mistakes [because, e.g., they seem so bizarre that nothing's set up to contain their effects] that even a very cheap machine would be a false economy. But I do think you need a steady like that in the argument.)
"Step like that", not "steady like that"!
I think implicitly I am saying that the machine (fully-costed including the training data) is more expensive than the human being and less adaptable
in fact, and this probably needs a full separate post, it produces "the decisions that a human being would make if they were restricted to data which was current as of the last training date".
I think the dream of doing this sort of thing at a industrial scale depends very much on the cost of inference being comparatively low, but this is a mirage, like the mirage of AI managers being more efficient because they are creatures of pure logic.
If you put a LLM in the place of a middle manager, then as far as I can see you are on the hook for ongoing training cost and updating of the training dataset, because you need it to replicate someone who is (insert favourite joke about whatever prick in your office this isn't true of) constantly learning and changing their views in line with experience.
The other interesting possibility of course is to redefine and re-engineer the entire information workflow around LLM decision making, but that is an absolutely huge project which might take a few more cups of coffee for me to think about.
I'm not going to pretend to expertise on the costs here, but doing this on the literal back of an envelope in a waiting room: say 10^9 dollars to train a model from scratch. Say a useful life of 5 years. Say it gets used in parallel for 2x10^4 positions . Then the fixed costs are recouped at 10^4 dollars per year per position. Now we do need to add on run-time costs ("inference") and (as you say) retraining. But retraining / updating is vastly cheaper than initial training and development. Even if those operating costs add up to 4x10^4 dollars per position per year, total 5x10^4, we'd still be looking at something rather cheaper than the median starting salary from the undergraduate program I teach in.
(The points to undermine the arithmetic would be shortening the lifecycle of the model and/or cutting the number of positions it gets used in in parallel.)
Yes I think the big question is how many jobs you could replace (particularly since this is an adversarial context, as I flagged but didn't have the space or brains to go into). A lot of it kind of boils down to the question "would you be happy replacing five people with one here, even if you knew that one person was capable of doing the work of the other four". To which the answer is "probably yes if I could guarantee there would be no edge or corner cases, but in the real world you are taking a hell of a risk by losing viewpoint diversity"
Also thinking about it there are really not many companies who employ 20,000 middle managers in the first place, so that level of headcount amortization depends on being willing to share with potential competitors, which could have some really interesting game theory implications
The labs are working on "continual learning" and there have been no loud claims of impossibility as yet, so the human approach of general training supplied by the model vendor/business school, and per-firm on-the-job training, seems feasible.
For, as Cosma says, lower cost than the equivalent humans.
Such an approach would probably double the working life of llms too.
Viewpoint diversity could be retained by using models from 3 vendors with majority voting. Still cheaper than one human per position in all likelihood.
With cognitive capacity separated from a physical body, there is now room for experimentation in org chart shapes.
I suppose we could also say that the true cost of initial model development and training should be much more than 10^9, if the developers have to actually pay for training data.
It's too trite to be universally true, but when I'm feeling lazy I assert that all hard problems are matters of judgement rather than of following the correct rules.
I don't think that's too trite! It's basically Gary Klein's work on expertise in a nutshell.
I think you can construct classes of things that can reasonably be described as "hard problems" that are at least arguably not judgement issues, e.g. deep math. For everyday stuff though, it seems like a decent heuristic, at least as a way to push back on people trying to control work they don't fully understand by insisting on proceduralising it.
"Unless you can hire an army of offshore workers to tag management decisions as “good” and “bad”, I suppose. But that sounds expensive."
Use business school materials (Management, Strategy, maybe Org. Behavior) to train Train an LLM or something more general, and see how it does. The corpus of such materials is a lot less than the whole web. This doesn't sound too expensive. Tell it to focus on what was known or could have been known when the decision was made, but to judge based on results ex post. Doesn't sound expensive relative to the potential benefits.
Nice idea but I think business school case study materials are very much designed to convey a specific point; training the model this way will lead it to over index on simple lessons and miss the complexity of the real situation that it is thrown into. (Again, what I am doing here is reproducing a very common critique of human brains which have been trained on business school case studies!)
(at last we have reproduced the MBA graduate who knows it all, from the famous Harvard Business School case study "Do not create the MBA graduate who knows it all" ;-))
Interesting as ever. Two things.
First, I think the question of whether one can replace middle management with AI is one which we are testing empirically, so I am loathe to leave hostages to fortune by predicting how it will turn out. My intuition is that if AI can do that, it can definitely do top management even better, and a whole load of other things besides, and the question will become pretty uninteresting.
Second, a lot of the discourse on this is missing a key component. While I do greatly value the systems/Stafford Beer view, it's only a model, and in reality middle management isn't only or even mostly an information processing job, it's a human relations job. You need to persuade senior managers and the staff being managed. Most good managers do this through building strong human relationships. Even if we spot AI the ability to do this in the abstract, arguendo, my hunch is that many people will resist being managed by an AI, or at minimum will react very differently to an AI middle manager asking or telling them to do something.
I'm struck by the parallel with assessment in education. Computers have been better than this on technical metrics since long before AI and LLMs. The dark secret is how terrible humans are by comparison. (The penny-drop moment for me was a study on reliability where experienced human markers gave different grades to the exact same essay inserted at different points in a big stack of marking.)
The one thing humans can do easily and computers struggle with is convincing people that another human being cares about their work.
Middle management is distinctly not the same thing as running the whole business. And it's not at all obvious that management is anti-inductive the way that markets are (I could be persuaded that it is, but you would have to actually make the case).
Thanks, book pre-ordered!
Paul Watslawick said that the relationship steers the matter not the other way around. That understanding seems appropriate here. I am testing a LLM accordingly
Training the model to replicate human decisions based on the inputs they have at a given moment sounds like the Reinforcement Learning approach, which can be quite powerful.
The difficulty in using it for management performance optimisation feels similar to media mix modelling in marketing analysis somehow.
Very interesting. Quick question: if you were in charge of picking a cricket team would you develop an LLM , use another computer system or stick with the meat sack option?
Cricket I think is very much on the borderline between LLM and human (I'm assuming we could use the LLM with video input for training and input). Baseball already has that film with Brad Pitt made about picking teams with considerably less sophisticated algorithms so I guess it's on the other border of LLM.
At least in baseball, I think you would have to use both. There are a number of social factors in picking a team. For example, do you promote a rising young player to a secondary role, or retain a slightly-less-talented older player who will not chafe in such a role? (In baseball, the younger player usually has an indenture, and can be retained in the minor league for a few years.) This will require a fair knowledge of the individuals' psychology.
the market is extremely competitive, resulting in an industry structure where a small number of people outperform the market, like Warren Buffet or Bill Ackman or whoever you think is good, and the rest average out to the market, or below when transaction costs are figured in. Which is why index funds are popular. for LLMs to be good stock pickers, they would have to outcompete not only all the mediocre people, but also the Warren Buffet's and Bill Ackman's. That would be hard to do if the LLMs are trained on generic public info, rather than on the proprietary knowledge that these people have developed over decades and that they aggregate through their organizations and connections. Furthermore, to be good stock pickers, LLMs would have to outcompete other LLMs, meaning probably thousands and millions of LLMs. What seems more likely to me is that the biggest/best investment companies will develop specialized LLM tools to aid them in their stock picking, further widening the gulf between the best and the rest.
Good article Dan. The underlying issue is that these and many similar companies significantly over-hired a few years ago. Likely for many reasons (i.e. post Covid, short term needs) but clearly a lack of a mid-to-long term workforce strategy was underpinning their hiring approach. Cut to a few years later and they can now absolve themselves of this by pointing to their successful AI as being the reason for the job cuts.
Yeah, there are plenty of people working in middle management jobs which could reasonably be summarized as "guy who copies numbers from emails into a spreadsheet then reads the results out at a meeting" but if you hired a couple of hundred of them then that is kind of on you, rather than a vision of the future.