(2023-01-23) Chin Goodharts Law Isnt As Useful As You Might Think
Cedric Chin: Goodhart's Law Isn't as Useful as You Might Think. Goodhart’s Law is a famous adage that goes “when a measure becomes a target, it ceases to be a good measure.”
It is descriptive; it tells you of the existence of a phenomenon, but it doesn’t tell you what to do about it or how to solve it.
many of these ideas were worked out by W. Edwards Deming and his colleagues in the 70s and 80s, as part of a body of work known today as ‘Statistical Process Control’.
Amazon’s Weekly Business Review
many of the ideas in the WBR were actually taken from the field of Statistical Process Control
the folks at Amazon have developed a number of ways to solve for Goodhart’s Law.
formulations of Goodhart’s Law that are more useful than the original form.
*The first step is to turn Goodhart’s Law as a narrower, more actionable formulation. The one that I like the most is from Deming contemporary Donald Wheeler, who writes, in Understanding Variation:
When people are pressured to meet a target value there are three ways they can proceed:
- They can work to improve the system
- They can distort the system
- Or they can distort the data*
This list of possible responses to quantitative targets is attributed to Brian Joiner
Joiner’s list suggests a number of solutions:
Make it difficult to distort the system.
Make it difficult to distort the data, and
Give people the slack necessary to improve the system
Avoiding Goodhart’s Law requires you to also give people the space to improve the system. Which begs the question: how do you encourage people to do just that?
Before you can improve any system you must listen to the voice of the system
A target, goal, or budget usually represents some kind of ‘specification’
The ‘Voice of the Process’, on the other hand, is how the process actually works.
now GET ON IT!” This is a naive view of process improvement
it is not going to work for the kind of complex processes that you would find in a typical business
Business processes are often processes where you don’t know the inputs to your desired output
the first step is to figure out what those inputs are, and then figure out what subset of those you can influence, and then, finally, figure out the causal relationships between the input metrics and output metrics
you have to ignore the goal first, in favour of examining the process itself.
The way you get to this state is nothing at all like obsessively watching your target and measuring how far off you are from it
How the WBR Accomplishes This
In 2000, Amazon suffered from a legendary (read: terrible) holiday season
It was close, but we made it. Shortly after that holiday season we held a postmortem, out of which was born the Weekly Business Review (WBR). The purpose of the WBR was to provide a more comprehensive lens through which to see the business.
the format of the meeting was influenced by folk with strong Operations Research backgrounds
The Amazon WBR is a weekly operational metrics review meeting in which Amazon’s leadership team gathers and reviews 400-500 metrics within 60-90 minutes. It occurs — or so I’m told — every Wednesday morning.
a more detailed description of the WBR may be found in Chapter 6 of Working Backwards.
Amazon divides its metrics into ‘controllable input metrics’ and ‘output metrics’. Output metrics are not typically discussed in detail
the majority of discussions during WBR meetings focus on exceptions and trends in controllable input metrics
How do you come up with the right set of controllable input metrics? The short answer is that you do so by trial and error.
“Hmm, we’ve been driving up promotional newsletters for awhile now but there doesn’t seem to be a big difference in MQLs; maybe we should stop doing that”
One mistake we made at Amazon as we started expanding from books into other categories was choosing input metrics focused around selection, that is, how many items Amazon offered for sale.
Once we identified this metric, it had an immediate effect on the actions of the retail teams. They became excessively focused on adding new detail pages—each team added tens, hundreds, even thousands of items to their categories that had not previously been available on Amazon
did not produce a rise in sales,
This activity did cause a bump in a different output metric—the cost of holding inventory—and the low-demand items took up valuable space in fulfillment centers
ultimately finalized as - the percentage of detail page views where the products were in stock and immediately ready for two-day shipping, which ended up being called Fast Track In Stock.
Jeff (Bezos) was concerned that the Fast Track In Stock metric was too narrow. Jeff Wilke argued that the metric would yield broad systematic improvements across the retail business. They agreed to stick with it for a while, and it worked out just as Jeff Wilke had anticipated.
the WBR acts as a safety net — a weekly checkpoint to examine the relationships between controllable input metrics (which are set up as targets for operational teams) and corresponding output metrics
the WBR assumes that controllable input metrics are only important if they drive desirable outcomes
A larger point I want to make is that the WBR becomes a weekly synchronisation mechanism for the company’s entire leadership team
an insider who has been engaged in the WBR process over a period of months won’t see 500 different metrics — instead, their felt experience of a WBR deck is that they are looking at clusters of controllable input metrics that map to other clusters of output metrics. In other words, the WBR process forces them to build a causal model of the business in their heads.
this also explains why there are so many goddamn metrics in a typical WBR deck. Any output metric of importance in a business will typically be influenced by multiple input metrics. Pretending otherwise is to deny the multi-faceted nature of business
This is why it is often a mistake to ‘present a small, simple set of metrics’ or to anchor on a single ‘North Star metric’ for a business. Read more about this argument here.)
there are two other practices that Amazon uses to augment the WBR in order to prevent Goodhart’s Law-type situations.
The first is that the WBR is administered by a completely autonomous group — the Finance department. Each WBR meeting is opened by an individual from the Finance team, who takes note of all questions during the meeting and follows up with unfinished threads from the previous week’s WBR. Finance is responsible for certifying the data presented in the WBR; they sit in on the meeting
the most important function that Finance plays in the WBR process is that they are empowered (and incentivised!) to investigate and dive deep into any of the metrics that are presented during the WBR.
finance team should “have no skin in the game other than to call it like they see it,” based on what the data revealed
In addition, metrics owners and finance team members alike are expected to run separate auditing processes for metrics that matter
It can’t be that you want your org to run without numbers. And it can’t be that you eschew quantitative goals forever!
I think the biggest lesson of this essay is just how difficult it is to be data driven.
Edited: | Tweet this! | Search Twitter for discussion