To a psychologist interested in cognition, what jumps out about Goodhart's law is the behavioral aspect. Measuring something OR making it a target, then labeling it, making it public, or otherwise calling attention to it, changes the behavior of people. People will always try to game a system for maximum benefit, in any context; that is rational behavior. Therefore, because of human cognition, announcing a measurement or target will alter people's behavior and change the meaning of the target measurement, in many cases.
Simple example: you are targeting a particular measurement in medical lab tests, and you make that the goal while keeping on with unhealthy lifestyle choices, then the target, the measurement, has become a poor representative of health. It's original meaning is changed. It has been changed precisely because the measurement was made a goal, a proxy for the real goal (health), and this invited "gaming the system."
'Goodhart's Law is an adage that states, "When a measure becomes a target, it ceases to be a good measure"' - That had four citations, so that is the proper pithy version I guess. The bot gave several good examples, including this:
'An illustrative example of Goodhart's Law is the bounty on cobras in colonial India, where citizens started breeding cobras to receive the reward, ultimately increasing the cobra population'
And a fair summary: 'The law suggests that when a particular measure is used as a target or goal, people tend to optimize their actions to meet that target, often at the expense of other important aspects or unintended consequences.'
The pithy version of Goodhart's law I ran into before was "a measurement used as a criterion becomes a poor measurement."
The underlying reason is that a measurement or target often leads to an emphasis on one selected variable (such as GPA or GDP). That decision is a behavior, a choice influencing attention or policy.
This behavior (creating a target or key measurement for some important ongoing decision making process) commonly leads to two related problems. (1) tunnel vision -- the neglect of potentially more important variables, and (2) gaming the system.
Use those two examples, GPA and GDP (!). Concentrating on GPA, as a student, may lead to avoiding tough classes that would be more beneficial to a person's learning in the long run. Concentrating on GDP may lead to ignoring other important variables in an economy, which is close to Goodhart's original context, apparently.
In very different subject areas, then, making a variable into a target or criterion can have the effect (mediated by human decision making) of making that variable a poor gauge of success (a "poor measurement").
One difficultly in applying Goodhart’s law now is that countries that can afford it dedicate tremendous amounts of human activity to validating the results of nonsense tests for advancement as though they measured what we originally cared about. LLMs are more precisely and exclusively taught to such tests, and I worry that they will be used not to invalidate those tests, but to automate some of the bullshit, so that we can accommodate the demographic transition by eliminating some jobs without admitting how much of our business was bullshit in the first place.
Seems to me like there’s a straightforward answer on the affirmative action question. If deviations from the easy to survey but noisy target are used to allocate more precise but costly investigative capacity to distinguish bias from luck, rather than directly triggering offsetting measures, that’s not a quota - or at least wouldn’t be if the scrutiny weren’t itself costly to the employers being scrutinized.
Close analogy to the system of policing and courts, I think. Ideally, police attention or even an arrest is not a punishment. In practice it is, but the ideal the system is justified with reference to clearly calls for minimizing pre-trial penalties, so that police or prosecutorial attention is more like a search mechanism triggering an investigation than a deterrent in itself.
yeah, I think that with affirmative action there's a genuine difference of opinion between people who (in my slightly strained terms) view diversity as an outcome variable in itself, that's desirable, and people for whom it's an imperfect measure of an unobservable "unbiased and meritocratic admissions process".
To a psychologist interested in cognition, what jumps out about Goodhart's law is the behavioral aspect. Measuring something OR making it a target, then labeling it, making it public, or otherwise calling attention to it, changes the behavior of people. People will always try to game a system for maximum benefit, in any context; that is rational behavior. Therefore, because of human cognition, announcing a measurement or target will alter people's behavior and change the meaning of the target measurement, in many cases.
Simple example: you are targeting a particular measurement in medical lab tests, and you make that the goal while keeping on with unhealthy lifestyle choices, then the target, the measurement, has become a poor representative of health. It's original meaning is changed. It has been changed precisely because the measurement was made a goal, a proxy for the real goal (health), and this invited "gaming the system."
Perplexity.ai had a few worthy "comments":
'Goodhart's Law is an adage that states, "When a measure becomes a target, it ceases to be a good measure"' - That had four citations, so that is the proper pithy version I guess. The bot gave several good examples, including this:
'An illustrative example of Goodhart's Law is the bounty on cobras in colonial India, where citizens started breeding cobras to receive the reward, ultimately increasing the cobra population'
And a fair summary: 'The law suggests that when a particular measure is used as a target or goal, people tend to optimize their actions to meet that target, often at the expense of other important aspects or unintended consequences.'
The pithy version of Goodhart's law I ran into before was "a measurement used as a criterion becomes a poor measurement."
The underlying reason is that a measurement or target often leads to an emphasis on one selected variable (such as GPA or GDP). That decision is a behavior, a choice influencing attention or policy.
This behavior (creating a target or key measurement for some important ongoing decision making process) commonly leads to two related problems. (1) tunnel vision -- the neglect of potentially more important variables, and (2) gaming the system.
Use those two examples, GPA and GDP (!). Concentrating on GPA, as a student, may lead to avoiding tough classes that would be more beneficial to a person's learning in the long run. Concentrating on GDP may lead to ignoring other important variables in an economy, which is close to Goodhart's original context, apparently.
In very different subject areas, then, making a variable into a target or criterion can have the effect (mediated by human decision making) of making that variable a poor gauge of success (a "poor measurement").
I appreciate the brevity of your posts, Mr. Davies, and I understand the impulse to keep 'em short. (My own writing is too verbose, for sure.)
That said, I could really use a few more sentences explaining your last paragraph of this one.
thanks very much, there will be a part 2
One difficultly in applying Goodhart’s law now is that countries that can afford it dedicate tremendous amounts of human activity to validating the results of nonsense tests for advancement as though they measured what we originally cared about. LLMs are more precisely and exclusively taught to such tests, and I worry that they will be used not to invalidate those tests, but to automate some of the bullshit, so that we can accommodate the demographic transition by eliminating some jobs without admitting how much of our business was bullshit in the first place.
Seems to me like there’s a straightforward answer on the affirmative action question. If deviations from the easy to survey but noisy target are used to allocate more precise but costly investigative capacity to distinguish bias from luck, rather than directly triggering offsetting measures, that’s not a quota - or at least wouldn’t be if the scrutiny weren’t itself costly to the employers being scrutinized.
Close analogy to the system of policing and courts, I think. Ideally, police attention or even an arrest is not a punishment. In practice it is, but the ideal the system is justified with reference to clearly calls for minimizing pre-trial penalties, so that police or prosecutorial attention is more like a search mechanism triggering an investigation than a deterrent in itself.
yeah, I think that with affirmative action there's a genuine difference of opinion between people who (in my slightly strained terms) view diversity as an outcome variable in itself, that's desirable, and people for whom it's an imperfect measure of an unobservable "unbiased and meritocratic admissions process".