Super Forecasting: Book Review

Super Forecasting: The Art and Science of Prediction (2015), Philip Tetlock & Dab Gardner. This is more or less a followup on Tetlock's Expert Prediction Judgment (EPJ), using tournaments to measure people's (some expert, some not) ability to establish accurate probabilities of possible future events. The results were disappointing, in that the average prediction was about as good as a dart-throwing chimp (that is, no better than random). The first reason was relatively long time periods (prediction accuracy declines the greater the time period; meteorologists are very good weather predictors for a day or two, then accuracy declines). He found that self-identified hedgehogs were poor predictors and foxes much better. That was the starting point to analyze what it took to increase forecast accuracy, conveniently funded by IARPA, a unit of the federal intelligence community. Tournaments were again used, but forecasts were all short-term. The authors did not stress hedgehog/fox differences, but instead tried to determine the specific factors that went into relatively accurate forecasts (granted, the super forecasters probably were all foxes). They ended up with a checklist that proved a useful starting point and improved forecasts about 10% (summarized below).

Following up on EPJ, the average forecaster was still in the dart-throwing chimp range, even though the questions were short-term. Long-term nonlinear forecasts are difficult. Meteorologist Edward Lorenz came up with what became chaos theory, that small changes in initial conditions produced vastly different weather patterns (the butterfly effect)--the opposite of the Laplacean clock as a symbol of perfect predictability. "How predictable something is depends on what we are trying to predict, how far into the future, and under what circumstances" (p. 12). Tetlock wants to know what factors improve forecasts, by how much, and how good forecasts can become with best practices. This was done under an IARPA forecasting tournament, with Tetlock's Good Judgment Team one of five. Keys were open-minded, careful, curious and self-critical thinking, plus focus.

Chapter 2 starts with medicine history, where Galen and other doctors describe what they believed and the treatment, but not about whether they were right. Some patients recovered; those that didn't were "incurable." Much of 21st century medicine resembles 19th century medicine, with theories, assertions and arguments. The first experiment was apparently in 1747 when British ship's doctor gave different treatments to sailors with scurvy. Only those eating citrus fruit recovered. However, the use of randomized trial experiments with careful measurements had to wait for the 20th century. [Too bad so many doctors suffered from the "God complex."] This leads to the importance of introspection; Tetlock uses the two system model (written about by Kahneman, Thinking Fast and Slow), with system 1 the automatic perceptual operations (which jumps to conclusions with little evidence--availability bias a problem here, often an attribute substitution) and system 2 conscious thought, where introspection takes place (first out is confirmation bias). Tetlock calls the two "blink versus think." Note that pattern recognition can lead to accurate intuition (but comes at the cost of false positives). The Cognitive Reflection Test shows that most people are not very reflective.

Chapter 3, Keeping Score. No one has ever tested the accuracy of political experts. First problem is timeline; when and/or over what time period will this happen. The Coordinator of Information was formed in 1941, became the Office of Strategic Services (OSS), and the CIA. Information intelligence is central, the methodical examination of information collected by spies and surveillance. Information could be so vague, it could cause misunderstanding. Major failings included Bay of Pigs, where "fair chance" meant perhaps one in three but sounded like nearly certain to Kennedy (note really bad framing). People have trouble with probabilistic language. Glen Brier (1950) developed Brier Scores to measure the distance between forecast and what actually happened. A 50-50 forecast has a Brier score of .5; while a totally wrong forecast is 2. Another benchmark is who can beat the consensus forecast. Key factor was how the forecasters thought. Hedgehogs organized thinking around Big ideas and problems had to fit into expected cause-effect templates (e.g., Larry Kudlow's supply side economics--note: greater fame meant less accurate forecasts). Foxes were pragmatic, using multiple analytical tools based on specific problem and using as many sources as they could (dragonfly eye effect); in summary they aggregate. They also think differently in different situations.

National Intelligence Forecasts are consensus views of Central Intelligence. Iraq weapons of mass destruction leading to 2003 invasion worst intelligence failure in modern history. They never explored whether they could be wrong. IARPA founded in 2006 to make IC more effective; e.g., intuitive appeal not good enough. Accuracy metrics necessary to hold IC accountable. IC focused on process rather than accuracy. Mauboussin: slow regression to mean seen in activities dominated by skill, while faster regression in change games. Super forecasters got better. Robert McNamara, famous as Defense Secretary during the Vietnam War, wrote: "We failed to analyze our assumptions critically, then or later." Judgments should be critiqued to spot flaws and offer other perspectives, including point-counterpoint discussions. Note that "openness to experience" includes preference for variety and intellectual curiosity; also the importance for "active open-mindedness." "The great majority of [super forecasters] are simply the product of careful thought and nuanced judgment" (p. 129). Consider "ignorance priors" (p. 135).

Probability started in 1713 with Jakob Bernoulli's publication. Certainty is considered more important than probabilities, especially changes (in probability versus to certainty). Uncertainty: epistemic uncertainty: something you don't know but is knowable. Aleatory uncertainty is not knowable. Thus, probabilistic thinking is essential to accurate forecasting. Summary of process:

"Unpack the question into components. Distinguish as sharply as you can between the known and unknown and leave no assumptions unscrutinized. Adopt the outside view and put the problem into a comparative perspective that downplays its uniqueness and treats it as a special case of a wider class of phenomena. Then adopt the inside view that plays up the uniqueness of the problem. Also explore the similarities and differences between your views and those of others--and pay special attention to prediction markets and other methods of extracting wisdom from crowds. Synthesize all these different views into a single vision as acute as that of a dragonfly. Finally, express your judgment as precisely as you can, using a finely grained scale of probability. ... They are judgments that are based on available information and that should be updated in light of changing information" (p. 153). Question of giving proper weight to new information, which can be under- or over-reaction. Japanese internment of 1942 (p. 160). Public commitment results in resistance to change. Dilution effect and stereotypes; estimates based on tidbits of information, sometimes irrelevant. Small corrections are suggested and frequent updating.

Thomas Bayes and Bayes' theorem (p. 169): P(H given D)/P(-H given D)=P(D given H)/P(D given -H)xP(H)/P(-H), where posterior odds=likelihood ratio x prior odds). New beliefs depend on 1) prior belief) times diagnostic value of new information. Emphasis on constantly updating in proportion to weight of evidence.

Chapter 9 is called "Perpetual Beta," the idea that the thought process is constantly updating and does not reach a definitive conclusion (under after the fact). Forecasters can have a "fixed mindset" versus "growth mindset," that requires learning and hard work, plus learning from failure. Time lag affects feedback, including hindsight bias (note Baruch Fischloff, p. 183). Grit is passionate perseverance of long-term goals. Teams can sharpen judgment (e.g., constructive confrontation) or protocol can stifle critical thinking. For Tetlock group, teams were 23% more accurate than individuals. They also beat prediction markets.

Moltke's legacy (p. 213): In war, everything is uncertain. Never entirely trust your plan. Petraeus believed in intellectual flexibility, the divide between doers and thinkers. Tension between control and innovation. Don't conflate facts and values. Kahneman: time frame is the scope of the forecast, with widespread scope insensitivity (when can be a difficult question. Black swan of Nassim Taleb: black swans alone determine history (which suggest that short-term forecasts are irrelevant). Importance of consequences--the impact of black swans take time to develop (think storming the Bastille). Taleb: world is more volatile than we realize, with risks of miscalculations. Note: vague expectations and fuzzy thinking not helpful (can never be proven wrong).

Government policy quote (p256): "evidence-based policy is a movement modeled on evidence-based medicine, with the goal of subjecting government policies to rigorous analysis so that legislators will actually know--not merely think they know--whether policies do what they are supposed to do." "What would help is a sweeping commitment to evaluation: keep score. Analyze results. Learn what works and what doesn't. But that requires numbers" (p. 258). "Bayesian question clustering": answering unscorable big questions with clusters of little ones.

2008 economic turmoil: Keynesians versus Austrians: big government spending vs. austerity. Strident voices dominate debate.

Summary:

Philosophical outlook: caution, humility, events are non-deterministic.

Think style: actively open-minded, intellectual curious, introspective and self-critical, comfortable with numbers.

Methods of forecasting: pragmatic, analytical, dragonfly-eyed, probabilistic, thoughtful updating, aware of cognitive and emotional biases.

Work ethic: growth mindset, grit. (pp. 190-192).

Appendix checklist:

1. Triage: focus on Goldilocks zone of difficulty for best payoff; i.e., do not waste time on the unpredictable.

2. Break intractable problems into sub-problems. Separate out knowable and unknowable parts and be careful about assumptions.

3. Strike a balance between inside and outside views. Outside views include comparison classes/events. [The outside view should come first, because of anchoring; making estimates start with some number and adjust.

4. Balance under- and overreaction to new evidence (incremental belief updating.

5. Look for clashing causal forces: argument versus counterargument (thesis, antithesis, synthesis).

6. Determine the right level of doubt (skepticism). George Tenet's "slam dunk" about Iraq was irresponsible.

7. Include both prudence and decisiveness (under- versus overconfidence).

8. Look for errors behind earlier mistakes, but beware of hindsight bias. Feedback from multiple sources are valuable.

9. Use team management, especially alternate perspectives and including constructive confrontation and precision questioning.

10. Practice is essential (remember Gladwell's 10,000-hour rule), deep, deliberative practice with feedback.