return to ICG Spaces home    ICG Risk Blog    discussions    newsletters    login    

Prediction without accountability, Part 3


Part 2

Tetlock's attack on "Thinking the Right Way" is no less demanding, dispensing with the easy assumption of "close ties between correspondence and coherence/process indicators of good judgment, between getting it right and thinking the right way," believing that investigators must begin to "array coherence/ process indicators along a rough controversy continuum anchored at one end by widely accepted tests and at the other by bitterly contested ones":

  1. At the close-to-slam-dunk end, we find violations of logical consistency so flagrant that few rise to their defense. The prototypic tests involve breaches of axiomatic identities within probability theory.
  2. In the middle of the continuum, we encounter consensus on what it means to fail coherence/process tests but divisions on where to locate the pass-fail cutoffs.
  3. At the controversial end of the continuum, competing schools of thought offer unapologetically opposing views on the standards for judging judgment.

No wonder that various sides (note I'd not said opposing sides as large issues have many protagonists) continue to claim victory and that forecaster and evaluator alike have difficulty in assessing accuracy within their field of study and in the court of public opinion:

To qualify as a good judge within a Bayesian framework [those who predict] must own up to one's reputational bets. [The] core idea is a refinement of common sense. Good judges are good belief updaters who follow through on the logical implications of reputational bets that pit their favorite explanations against alternatives: if I declare that x is .2 likely if my "theory" is right and .8 likely if yours is right, and x occurs, I "owe" some belief change.

In principle, no one disputes we should change our minds when we make mistakes. In practice, however, outcomes do not come stamped with labels indicating whose forecasts have been disconfirmed.

Tetlock spends much time outlining how forecasters rewrite or justify their predictions in order to continue to claim victory, starting with their "frequency and self-serving selectivity" of bet rewriting bets and the revisionist scale of the rewrites:

A balanced assessment [concedes] that Bayesians can no more purge subjectivity from coherence assessments of good judgment than correspondence theorists can ignore complaints about the scoring rules for forecasting accuracy. But that does not mean we cannot distinguish desperate patch-up rewrites that delay the day of reckoning for bankrupt ideas from creative rewrites that stop us from abandoning good ideas…

Shifting from forward-in-time reasoning to backward-in-time reasoning, we relied on turnabout thought experiments to assess the willingness of analysts to change their opinions on historical counterfactuals. The core idea is, again, simple. Good judges should resist the temptation to engage in self-serving reasoning when policy stakes are high and reality constraints are weak. And temptation is ubiquitous. Underlying all judgments of whether a policy was shrewd or foolish are hidden layers of speculative judgments about how history would have unfolded had we pursued different policies. We have warrant to praise a policy as great when we can think only of ways things could have worked out far worse, and warrant to call a policy disastrous when we can think only of ways things could have worked out far better. Whenever someone judges something a failure or success, a reasonable rejoinder is: "Within what distribution of possible worlds?"

From my systems point of view, "success" depends upon how the problem space is bounded in both time and effects. Apply too short a time span or too few parameters for measuring success, and one will drive for - and too often achieve - suboptimal results when measured in the longer term. I habitually see the failure to anticipate secondary and tertiary effects as one of the greatest errors in both event planning and forecasting, the results of which are 'unintended consequences' called blowback.

Turnabout thought experiments gauge the consistency of the standards that we apply to counterfactual claims. We fail turnabout tests when we apply laxer standards to evidence that reinforces as opposed to undercuts our favorite what-if scenarios… A balanced assessment here requires confronting a dilemma: if we only accept evidence that confirms our worldview, we will become prisoners of our preconceptions, but if we subject all evidence, agreeable or disagreeable, to the same scrutiny, we will be overwhelmed. As with reputational bets, the question becomes how much special treatment of favorite hypotheses is too much. And, as with reputational bets, the bigger the double standard, the greater are the grounds for concern.

Problems with hedgehogs:

Hedgehogs are typically embedded in political movements or theoretical movements and they typically have people who will back them up. They can fall back on a base of supporters who will help them generate various types of excuses or belief system defenses that will neutralize the unexpected evidence. So they'll be able to argue, "Well, what I predicted didn't happen, but it will happen soon," or, "I predicted that country X had weapons of mass destruction, and, well, it appears that it didn't, but it was the right mistake to have made."

It is true that if you wanted to identify the experts who have made the most spectacularly far-sighted predictions over the last 50 years, the hedgehogs would be disproportionately represented. But if you were computing batting averages, the hedgehogs would be clearly statistically inferior to the foxes.

Tetlock's conclusion:

The dominant danger remains hubris, the mostly hedgehog vice of closed-mindedness, of dismissing dissonant possibilities too quickly. But there is also the danger of cognitive chaos, the mostly fox vice of excessive open-mindedness, of seeing too much merit in too many stories. Good judgment now becomes a metacognitive skill--akin to "the art of self-overhearing." Good judges need to eavesdrop on the mental conversations they have with themselves as they decide how to decide, and determine whether they approve of the trade-offs they are striking in the classic exploitation-exploration balancing act, that between exploiting existing knowledge and exploring new possibilities… From a policy perspective, there is value in using publicly verifiable correspondence and coherence benchmarks to gauge the quality of public debates. The more people know about pundits' track records, the stronger the pundits' incentives to compete by improving the epistemic (truth) value of their products, not just by pandering to communities of co-believers.

Tetlock's call for monitoring:

I think on balance, it would be a good idea to give some serious thought to systematically monitoring political punditry. I think we monitor professionals in many other spheres of life. I think we monitor weather forecasters, we increasingly monitor stock market analysts, we sometimes monitor doctors. I don't think it's unreasonable to suppose that when people offer opinions on extremely consequential issues, like whether or not to go to war or whether or not to have welfare reform, or tax policy, trade policy, it's not unreasonable to ask what are their predictive track records in the past as a guide for how much credibility to attach to what they're saying in the present.

I agree with Tetlock's closing wish that "we as a society would be better off if participants in policy debates stated their beliefs in testable forms"—that is, as probabilities—"monitored their forecasting performance, and honored their reputational bets." I would augment this rating mechanism with something akin to a "wisdom of crowds" virtual stock market (VSM) tracker where the educated everyman can make their wager against the experts.

All bibliography citations in Part 1

Gordon Housworth

InfoT Public  Risk Containment and Pricing Public  Strategic Risk Public  
In order to post a message, you must be logged in
message date / author

There are no comments available.

In order to post a message, you must be logged in