Thinking, Fast and Slow: Difference between revisions

Content deleted Content added
No edit summary
No edit summary
Line 55:
=== II – Heuristics and Biases ===
 
🔢 '''10 – The Law of Small Numbers.''' A well-circulated statistical vignette maps kidney cancer across the 3,141 counties of the United States and finds that the very lowest rates cluster in sparsely populated, rural, largely Republican counties—until a second pass shows that the very highest rates cluster there too. The puzzle tempts causal stories about lifestyle or environment, but the simplest explanation is sample size: small populations produce more variable extremes. Kahneman ties this to his 1971 work with Amos Tversky at the Hebrew University, showing that people—researchers included—expect small samples to mirror the parent population far too closely. The same mistake fueled an education fad: because the top‑scoring schools in national comparisons were often small, a major foundation spent heavily to create small high schools; overlooked was that the worst performers were often small as well. In hiring, medicine, and investing, intuitive pattern‑spotting prefers neat causes over noisy denominators, so clusters and streaks are overread as meaningful. Even statisticians in their studies gave poor advice about sample sizes for replications, revealing how seductive the error can be. The recurring symptom is overconfidence attached to striking but unrepresentative data. The chapter’s point is that intuitive judgment underestimates how wildly results can swing when samples are small. In the book’s larger frame, System 1 hungers for causal tales, and only a numerate System 2 that attends to sample size can keep randomness from being mistaken for insight.
🔢 '''10 – The Law of Small Numbers.'''
 
⚓ '''11 – Anchors.''' Tversky and Kahneman’s classic 1974 demonstration began with a rigged “wheel of fortune” that stopped on 10 or 65 before participants estimated the percentage of African nations in the United Nations; those who saw the higher number gave higher guesses. Similar pulls show up outside the lab: experienced German judges rendered stiffer sentences after exposure to high, irrelevant numbers—whether a prosecutor’s demand or even random dice—than after low ones. Market behavior is not immune: in Dan Ariely, Drazen Prelec, and George Loewenstein’s experiments, the last two digits of participants’ Social Security numbers nudged how much they were willing to pay for wine, chocolate, and other goods. Two mechanisms are at work. One is deliberate “adjustment”: people start from the anchor and move insufficiently. The other is automatic “selective accessibility”: the anchor primes thoughts that make anchor‑consistent values feel plausible. Because anchors feel like helpful starting points, people rarely audit their origins or strength, and confidence in the final number can be high even when the starting number was arbitrary. The chapter’s lesson is that numbers we meet first shape numbers we choose next, often without awareness. Within the book’s theme, System 1 is easily primed by an anchor while System 2, averse to effort, adjusts too little unless it deliberately searches for independent evidence.
⚓ '''11 – Anchors.'''
 
📊 '''12 – The Science of Availability.''' In a 1973 paper, Tversky and Kahneman asked whether more English words begin with the letter K or have K as the third letter; because words that start with K come to mind more easily, many people judged that category as larger, even though the opposite is true in typical texts. In another experiment, listeners heard lists mixing famous and less famous names—say, 19 well‑known men and 20 obscure women—and later estimated that the gender associated with famous names had appeared more often. A later program of studies led by Norbert Schwarz showed that ease of retrieval can outweigh content: when people listed 6 examples of their own assertive behavior, they felt more assertive than those asked to list 12, because producing a dozen felt difficult and the mind used that difficulty as information. The same metacognitive cue appears across domains: repeated headlines, vivid images, and clean typography make claims feel truer because they are processed fluently. Availability shapes frequency and probability judgments not by counting cases, but by sampling what comes quickly to mind and how easy that felt. It is a helpful shortcut in familiar settings, yet it skews perception whenever salience, recency, or media coverage distort what is retrievable. The broader message is that minds mistake the experience of recall for a property of the world. In the book’s architecture, System 1 turns fluency into confidence, and only a reflective System 2 can ask whether what was easy to remember is also representative.
📊 '''12 – The Science of Availability.'''
 
⚠️ '''13 – Availability, Emotion, and Risk.''' Paul Slovic and colleagues documented the “affect heuristic,” showing that when a technology or activity feels good, people judge its benefits high and its risks low, and when it feels bad the pattern reverses—an inverse link driven by feeling rather than analysis. After disasters, economist Howard Kunreuther observed surges in insurance purchases that fade as the vividness of recent losses recedes, leaving communities underprotected before the next event. Gerd Gigerenzer’s analysis of U.S. travel after September 11, 2001, illustrated “dread risk”: many avoided flying—a low‑probability, high‑consequence hazard—and drove instead, contributing to additional traffic fatalities in the months that followed. Cass Sunstein labeled the mental move behind such reactions “probability neglect”: once emotion is high, tiny probabilities no longer feel tiny, and the search for worst cases overwhelms calibration. The mechanism is a fast substitution: the mind answers “How do I feel about this?” in place of “What is the likelihood and magnitude?”, then treats the feeling as if it were evidence. Vivid images, gripping narratives, and repetition amplify availability, which then steers policies and personal choices toward dramatic protections and away from base‑rate risks. The chapter’s thrust is that risk perception is often about affective pictures rather than arithmetic. In the book’s terms, System 1’s feelings flood judgment unless System 2 slows down to separate intensity of emotion from the size of the hazard.
⚠️ '''13 – Availability, Emotion, and Risk.'''
🎓 '''14 – Tom W’s Specialty.''' In 1973, Amos Tversky and Daniel Kahneman published a set of experiments in Psychological Review built around a fictional graduate student named Tom W, whose personality sketch sounded like a stereotypical computer scientist. One group of participants estimated base rates for nine fields of study among first‑year U.S. graduate students; another judged how similar Tom W was to typical students in those fields; a third predicted his field. Despite knowing that large programs like education and the humanities enroll many more students than computer science, many respondents ranked Tom W as more likely to be in computer science because the description fit the stereotype. The experiment showed how people leap from a vivid description to a probability judgment without integrating prior odds. Even when base rates were made explicit, judgments gravitated toward resemblance, not frequency. The pattern held whether answers were ranks or numerical probabilities, demonstrating that the mind privileges how well a case fits a category over how many such cases exist. Bayes’s rule would combine prior enrollment shares with the diagnostic value of the description; instead, judgments treated the description as if it were fully reliable. The broader idea is that representativeness drives predictions, while base rates are neglected when they feel merely statistical. In the book’s terms, System 1 matches a story to a stereotype and System 2 often fails to correct for the weak link between a sketch and the underlying distribution.
 
👩 '''15 – Linda: Less is More.''' In 1983, Tversky and Kahneman’s Psychological Review paper presented “Linda,” a 31‑year‑old, single, outspoken philosophy major concerned with social justice, and asked which is more probable: Linda is a bank teller, or Linda is a bank teller and active in the feminist movement. Across samples, many judged the conjunction more likely than the simpler statement, a logical error because adding details cannot increase probability. Joint and separate evaluations yielded the same pattern: plausibility and story fit overrode set inclusion. Frequency formats (“out of 100 people like Linda…”) reduced, but did not eliminate, the mistake, showing that the error is resilient to rewording. The case also revealed how ranking tasks amplify the pull of representativeness, as people sort options by narrative coherence. Critics proposed alternative framings, but the conjunction effect persists whenever a detailed story seems truer than a bare label. The example illustrates how the mind confuses plausibility with probability and treats richer descriptions as better answers even when they are strictly less likely. The central mechanism is attribute substitution: the question “How likely?” is quietly replaced by “How much does this fit the stereotype?”. Within the book’s theme, System 1 rewards a compelling story, and only a statistics‑minded System 2 reins in the appeal of extra detail.
🎓 '''14 – Tom W’s Specialty.'''
 
🔗 '''16 – Causes Trump Statistics.''' A well‑known base‑rate puzzle asks about a night‑time hit‑and‑run in a city where 85% of cabs are Green and 15% Blue, and a tested witness is 80% accurate at identifying colors; most people say the cab was Blue with 80% probability, ignoring the population split that yields a Bayesian answer near 41%. When the scenario is changed so that both firms are the same size but Green cabs cause about 85% of accidents, judgments swing toward the base rate because it now feels like a causal explanation. The numbers in the two stories are mathematically equivalent, but the mind treats them differently depending on whether they imply a mechanism. People readily weave stereotypes from causal base rates (“Green drivers are reckless”) and discount statistical base rates that lack a story. This preference for causes shows up in legal reasoning, health scares, and everyday attribution, where a single vivid observation trumps a large neutral denominator. The contrast reveals why neutral prevalence data are often sidelined and “pattern plus intent” feels decisive. The lesson is not to reject causes, but to force statistical and causal information to meet on the same page before deciding. In the book’s framework, System 1 privileges narratives that link events, while System 2 must bring base rates back into the judgment when stories run ahead of evidence.
👩 '''15 – Linda: Less is More.'''
 
📉 '''17 – Regression to the Mean.''' While working with Israeli Air Force flight instructors, I heard a confident claim that harsh criticism improves performance whereas praise makes it worse—based on observing cadets who often faltered after a superb maneuver and improved after a poor one. The pattern was real, but the explanation was not: performances that include luck tend to be followed by outcomes closer to average, regardless of what instructors say or do. The same tendency appears in athletics (“cover jinxes”), sales streaks, and test–retest scores, where extreme results are naturally followed by less extreme ones. Sir Francis Galton quantified this in 1886 with parent–child height data, showing that exceptional parents have children closer to the population mean. Regression is easiest to miss when attention is fixed on individual cases and causal stories—talent, effort, motivation—while variability and noise are overlooked. Punishment then seems to work and reward to fail because changes after extremes are misread as effects of feedback rather than statistics. Good evaluation requires separating skill from luck and comparing outcomes to appropriate baselines over time. The broader point is that human perception spots patterns and seeks causes even when randomness is doing most of the work. In this book’s terms, System 1 insists on a tale for every rise and fall, and only a statistical System 2 corrects for how noise drags extremes back toward the mean.
🔗 '''16 – Causes Trump Statistics.'''
 
🐎 '''18 – Taming Intuitive Predictions.''' Consider “Julie,” a precocious reader, and the task of predicting her college GPA years later: most people intuit a high number that matches the impression and ignore how weakly early reading predicts distant outcomes. A more accurate method starts with a baseline (the average GPA for comparable students), forms an intuitive estimate from the available cues, gauges the correlation between cue and target, and then moves only partway from the baseline toward the intuition. When the cue–outcome correlation is modest, extreme intuitive forecasts must be pulled back toward the mean; when it is near zero, the baseline rules. This approach reduces systematic over‑ and under‑shooting that comes from treating impressions as perfectly reliable. It also forces attention to the reference class—the distribution of outcomes for similar cases—rather than the singular story at hand. In hiring, admissions, and investing, the same discipline turns a compelling narrative into a tempered prediction that errs less and in both directions. The aim is not to silence intuition but to weight it by its proven validity, so strong evidence can still justify bold forecasts while weak evidence cannot. In the book’s larger frame, unchecked System 1 turns resemblance into certainty, and a deliberate System 2 restores calibration by anchoring forecasts to base rates and shrinking them by reliability.
📉 '''17 – Regression to the Mean.'''
 
🐎 '''18 – Taming Intuitive Predictions.'''
 
=== III – Overconfidence ===
🪞 '''19 – The Illusion of Understanding.''' A glossy business‑press account of Google’s rise strings decisive hires, bold product calls, and near‑misses into a single, satisfying arc, giving readers the feeling that the company’s success was inevitable and decipherable. That feeling is a mirage built from selective facts, hindsight, and the halo effect, which credits leaders with foresight when results are good and faults them when results sour. Outcome knowledge narrows what once felt uncertain into a tidy plot, and WYSIATI—what you see is all there is—keeps inconvenient alternatives offstage. Phil Rosenzweig’s critique of management case studies shows how performance swings can flip narratives without changing the underlying practices, while regression to the mean disguises luck as a trend. We overrate stories that backfill clear causes, underrate noise, and then carry away lessons that travel poorly beyond the one story we just read. Confidence grows with coherence, not with evidence, so self‑assured punditry often reflects fluent storytelling rather than predictive skill. The core idea is that the mind prefers explanations that make past events feel necessary, and that preference feeds overconfidence about the future. The mechanism is narrative compression: System 1 stitches fragments into a single cause‑and‑effect line, and unless System 2 deliberately restores uncertainty and base rates, the story hardens into false understanding. ''These stories induce and maintain an illusion of understanding, imparting lessons of little enduring value to readers who are all too eager to believe them.''
 
✅ '''20 – The Illusion of Validity.''' Many decades ago, while serving in the Israeli Army’s Psychology Branch, I helped rate officer candidates in a “leaderless group challenge,” a British‑designed World War II exercise where eight strangers, stripped of insignia and tagged by number, had to shoulder a long log together and get it over a six‑foot wall without letting it touch. Under a scorching sun, my colleagues and I felt sure we could spot future leaders from a few minutes of talk, posture, and initiative. Follow‑ups showed our predictions barely beat chance, yet our confidence survived each new batch of evidence. The feeling came from a crisp story—visible traits seemed to map neatly onto military success—so our minds mistook coherence for validity, much like seeing the Müller‑Lyer illusion even after learning the lines are equal. Years later, a 1984 visit to a Wall Street firm revealed the same pattern in stock‑picking: enormous effort and training produced strong conviction without durable predictive edge. Across domains, high subjective confidence indicates a well‑fitted narrative more than a reliable forecast. The idea is that confidence is a feeling about a story’s internal fit, not a calibrated estimate of accuracy. The mechanism is selective coherence: System 1 locks onto a pattern and System 2, reluctant to audit, accepts it as skill unless hard feedback and statistics force revision. ''I was so struck by the analogy that I coined a term for our experience: the illusion of validity.''
🪞 '''19 – The Illusion of Understanding.'''
 
✅ '''20 – The Illusion of Validity.'''
 
➗ '''21 – Intuitions vs. Formulas.''' Princeton economist Orley Ashenfelter showed how a three‑variable weather rule—summer temperature, harvest rainfall, and prior winter rain—predicts the future prices of Bordeaux vintages with striking accuracy (correlation above .90), outdoing celebrated tasters years or decades later. Paul Meehl’s review of 20 studies had already found that simple statistical combinations routinely beat clinicians and counselors at predicting grades, parole violations, pilot training success, and more. The same lesson appears in the delivery room: Virginia Apgar’s five‑item, 0‑to‑2 scoring checklist standardized newborn assessment and helped cut infant mortality by turning scattered impressions into a consistent rule. Robyn Dawes pushed further, showing that “improper” models with equal weights often match or beat optimally weighted regressions and easily outperform unaided judgment. Humans are inconsistent and context‑sensitive—mood, order effects, and stray cues shift conclusions—whereas formulas return the same answer for the same inputs and don’t tire or improvise. People still resist algorithms, mistaking the vivid feel of expertise for proof of predictive power and clinging to the rare “broken‑leg” exception. The idea is that when environments are noisy and validity is low, disciplined rules deliver more reliable forecasts than expert impressions. The mechanism is noise reduction and proper weighting: System 2 embeds expertise into transparent, repeatable formulas that tame intuitive inconsistency and overfitting. ''The research suggests a surprising conclusion: to maximize predictive accuracy, final decisions should be left to formulas, especially in low‑validity environments.''
➗ '''21 – Intuitions vs. Formulas.'''
 
🧠 '''22 – Expert Intuition: When can we trust it?.'''