Tetlock put his pundits to unaccustomed test by phrasing the majority of his forecasting questions into a "three possible futures" form:
The respondents were asked to rate the probability of three alternative outcomes: the persistence of the status quo, more of something (political freedom, economic growth), or less of something (repression, recession). And he measured his experts on two dimensions: how good they were at guessing probabilities (did all the things they said had an x per cent chance of happening happen x per cent of the time?), and how accurate they were at predicting specific outcomes. The results were unimpressive. On the first scale, the experts performed worse than they would have if they had simply assigned an equal probability to all three outcomes—if they had given each possible future a thirty-three-per-cent chance of occurring. Human beings who spend their lives studying the state of the world, in other words, are poorer forecasters than dart-throwing monkeys, who would have distributed their picks evenly over the three choices.
I do not agree with Bryan Caplan's Tackling Tetlock in which he admits that his research "between economists and the general public… defends the simple "The experts are right, the public is wrong"" by proceeding to indict Tetlock:
Tetlock's sample suffers from severe selection bias. He deliberately asked relatively difficult and controversial questions. As his methodological appendix explains, questions had to "Pass the 'don't bother me too often with dumb questions' test." Dumb according to who? The implicit answer is "Dumb according to the typical expert in the field." What Tetlock really shows is that experts are overconfident if you exclude the questions where they have reached a solid consensus. [Emphasis by author]
Being (presumed) right about a broad theme is not solid forecasting in my book. Caplan's opinion that "Experts really do make overconfident predictions about controversial questions… However, this does not show that experts are overconfident about their core findings" evades the point that when the chips are down and specificity is high, that forecasters fall very short.
It is that specificity that moves Tetlock forward. In an interview with Carl Bialik, Evaluating Political Pundits, WSJ:
Tetlock's innovation was to elicit numerical predictions. As he noted in an interview with me, political punditry tends toward the oracular: statements vague enough to encompass all eventualities. [He] was able to get pundits to provide probability estimates for such questions as whether certain countries' legislatures would see shifts in their ruling parties, whether inflation or unemployment would rise and whether nations would go to war.
Without numerical predictions, "it's much easier to fudge," Prof. Tetlock told me. "When you move from words to numbers, it's a really critical transition." What he found is that people with expertise in explaining events that have happened aren't very successful at predicting what will happen.
Bialik's interview also tempers Menand's conclusion in Everybody's an Expert to "Think for yourself," ignoring any and all forecasting, by citing Tetlock's distancing from such a broad brush statement:
Facts on File about political hot spots, then asked them to make forecasts. Their predictions -- based on far less background knowledge than his pundits called upon -- were the worst he encountered, even less accurate than the worst hedgehogs. "Unassisted human intuition is a bomb here."
[Tetlock] pointed out an exercise he conducted in the course of his research, in which he gave Berkeley undergraduates brief reports from
We have seen the prediction/decision making problem even among those of presumed 'assisted human intuition' such as commercial business managers. In When clients for risk assessment/risk pricing take on a risk of their own , we catalog five common characteristics:
- Believe that they are well informed even when they are ill informed
- Not invented here (NIH)
- Inability to distill
- Competitive bad advice
The client is often no better, or worse, condition than the pundits who advises them. Part two of that same risk series, The merger of Inability to distill, Not invented here, and Competitive bad advice, refers "readers to the Berlin Wisdom Model as an intelligence analysis mindset and tool for its introduction to an approach to wisdom containing five broad areas without which I do not believe good analysis and prediction can occur." Those are:
- A fund of general knowledge
- Procedural knowledge
- An understanding of the relativity of values
- An understanding that meaning is contextual
- Acceptance of change
Pundits of all stripes would benefit by its adoption.
While it may be more entertaining to read what other say about Tetlock, it is valuable to consult Expert Political Judgment directly:
This book is predicated on the assumption that, even if we cannot capture all of the subtle counterfactual and moral facets of good judgment, we can advance the cause of holding political observers accountable to independent standards of empirical accuracy and logical rigor. Whatever their allegiances, good judges should pass two types of tests:
- Correspondence tests rooted in empiricism. How well do their private beliefs map onto the publicly observable world?
- Coherence and process tests rooted in logic. Are their beliefs internally consistent? And do they update those beliefs in response to evidence?
In plain language, good judges should both "get it right" and "think the right way."
Many have pursued these goals (with tools no better than the forecast under examination), but few with the rigor that Tetlock employs, believing that confidence "in specific claims should rise with the quality of converging evidence… from diverse sources" just as confidence "in the overall architecture of our argument should be linked to the sturdiness of the interlocking patterns of converging evidence." Once again, it is the rigor that Tetlock employs that creates a testable base open to further refinement.
Tetlock rightly defines "Getting It Right" as an elusive construct that can be approached with "correspondence theories of truth" that pair good judgment with the "goodness of fit between our internal mental representations and corresponding properties of the external world":
We should therefore credit good judgment to those who see the world as it is--or soon will be [the corollaries of which are] we should bestow bonus credit on those farsighted souls who saw things well before the rest of us [and] penalize those misguided souls who failed to see things long after they became obvious to the rest of us.
Tetlock describes a "gauntlet of five challenges" that is needed to assess the "superficially straightforward conception of good judgment" (Emphasis mine for clarity):
We risk making false attributions of good judgment if some forecasters have been dealt easier tasks than others...
Challenging whether forecasters' "hits" have been purchased at a steep price in "false alarms." We risk making false attributions of good judgment if we fixate solely on success stories--crediting forecasters for spectacular hits (say, predicting the collapse of the Soviet Union) but not debiting them for false alarms (predicting the disintegration of nation-states--e.g., Nigeria, Canada--still with us)...
Challenging the equal weighting of hits and false alarms. We risk making false attributions of good judgment if we treat political reasoning as a passionless exercise of maximizing aggregate accuracy...
Challenges of scoring subjective probability forecasts. We cannot assess the accuracy of experts' predictions if we cannot figure out what they predicted...
Challenging reality. We risk making false attributions of good judgment if we fail to recognize the existence of legitimate ambiguity about either what happened or the implications of what happened for the truth or falsity of particular points of view...
- Challenging whether the playing fields are level.
Part 3 Conclusion
All bibliography citations in Part 1