The Forecaster Test

Measure the cognitive skills that actually predict forecasting ability

Intelligence matters for many life outcomes, but surprisingly, IQ only weakly predicts forecasting ability (r ≈ .2). Even more counterintuitively, Tetlock's 20-year study of 82,361 predictions found that specialists performed no better in their own field than outside it—and "hedgehog" experts who relied heavily on domain expertise actually did worse when predicting within their specialty. Smart experts get caught off guard because expertise breeds overconfidence, not accuracy.

What matters more is what Mellers et al. called "good judgment"—a combination of probabilistic thinking, calibration, and willingness to update beliefs. This isn't an IQ test. It measures the cognitive skills that actually predict whether you'll be good at anticipating the future.

How it works

1. Take the Assessment — Measure your baseline across Bayesian reasoning, diagnostic thinking, cognitive reflection, and open-minded thinking.

2. Make Predictions — Forecast real-world events drawn from prediction markets. Your predictions are stored and scored when events resolve.

3. Track Your Accuracy — See how your judgment score correlates with actual forecasting performance over time.

Participants
Predictions Made
Questions Resolved

Assessment

Based on research from Tetlock, Mellers, and Baron

This measures four dimensions of judgment quality that predict forecasting accuracy. Partial credit for Bayesian problems.

Time: ~5-7 minutes

Section 1 of 5 0 / 27
Section 1 Bayesian Reasoning
These problems test updating probability estimates given new evidence. Partial credit given.
Problem 1 of 2

Base rate: 1% of women age 40 have breast cancer.

If cancer present: 80% chance of positive mammogram.

If no cancer: 9.6% chance of positive mammogram (false positive).

A woman gets a positive mammogram. What's the probability she has cancer?
%
Problem 2 of 2

In a population: 10 of 1,000 have HIV.

Test catches all 10 true positives.

Test also flags 40 of 990 uninfected people (false positives).

Someone tests positive. What's the probability they have HIV?
%
Section 2 Diagnostic Reasoning
This tests your ability to identify which information is most useful for decisions.
Test Selection

A patient has 80% chance of Disease A, 20% chance of Disease B.

Test 1: 90% positive if Disease A; 20% positive if not A.

Test 2: 90% positive if Disease B; 10% positive if not B.

Which test provides more diagnostic information?
Section 3 Cognitive Reflection
Take a moment to verify your answer.
Problem 1 of 2
A bat and ball cost $1.10 total. The bat costs $1.00 more than the ball. How much does the ball cost?
$
Problem 2 of 2
A lily pad patch doubles daily. It covers the lake on day 48. When did it cover half the lake?
days
Section 4 Thinking Style
Rate your agreement with each statement.
Statement 1 of 11
I tend to make decisions quickly rather than deliberating for a long time.
Strongly DisagreeStrongly Agree
Statement 2 of 11
People should take into consideration evidence that goes against their beliefs.
Strongly DisagreeStrongly Agree
Statement 3 of 11
I prefer explanations that tie everything together with one big idea.
Strongly DisagreeStrongly Agree
Statement 4 of 11
Changing your mind is a sign of weakness.
Strongly DisagreeStrongly Agree
Statement 5 of 11
I often seek advice from others before making important decisions.
Strongly DisagreeStrongly Agree
Statement 6 of 11
People should search actively for reasons why they might be wrong.
Strongly DisagreeStrongly Agree
Statement 7 of 11
Most important problems have multiple partial causes rather than one root cause.
Strongly DisagreeStrongly Agree
Statement 8 of 11
I find it energizing to discuss controversial topics.
Strongly DisagreeStrongly Agree
Statement 9 of 11
It is important to be loyal to your beliefs even when evidence is brought against them.
Strongly DisagreeStrongly Agree
Statement 10 of 11
Specialists generally make better predictions in their field than generalists.
Strongly DisagreeStrongly Agree
Statement 11 of 11
I enjoy debates and arguments.
Strongly DisagreeStrongly Agree
Section 5 About You
Optional but helps us analyze what predicts good judgment.
Highest Education Completed
Primary Field
Political Orientation
Very LiberalVery Conservative
Prediction Market Experience
Familiar with Tetlock's Superforecasting Research?
Section 6 Scientific Calibration
Each study below was later tested in a large, pre-registered replication attempt. Estimate the probability that the original finding successfully replicated.
Study 1 of 6

Ego Depletion (1998)

Participants who first resisted eating cookies (exerting self-control) gave up faster on a subsequent puzzle task than those who hadn't resisted temptation. The researchers concluded that willpower is a limited resource that gets depleted with use.

Probability this replicated?
50%
Study 2 of 6

Facial Feedback (1988)

Participants who held a pen in their teeth (forcing a smile-like expression) rated cartoons as funnier than those who held the pen with their lips (preventing smiling). The researchers concluded that facial expressions can directly influence emotional experience.

Probability this replicated?
50%
Study 3 of 6

Anchoring Effect (1974)

Participants who first saw a random number (e.g., spinning a wheel showing "65") gave higher estimates to unrelated questions (e.g., "What percentage of African nations are in the UN?") than those who saw lower random numbers. The researchers concluded that arbitrary initial values bias subsequent numerical judgments.

Probability this replicated?
50%
Study 4 of 6

Power Posing (2010)

Participants who held "expansive" poses (arms spread, taking up space) for two minutes showed increased testosterone and decreased cortisol compared to those in "contractive" poses. The researchers concluded that body posture directly affects hormone levels and feelings of power.

Probability this replicated?
50%
Study 5 of 6

Loss Aversion (1979)

When choosing between gambles, people required potential gains to be roughly twice as large as potential losses before they'd accept a 50/50 bet. The researchers concluded that losses loom larger than equivalent gains in decision-making.

Probability this replicated?
50%
Study 6 of 6

Elderly Priming (1996)

Participants who unscrambled sentences containing words related to old age (e.g., "Florida," "wrinkle," "gray") walked more slowly down the hallway afterward than those exposed to neutral words. The researchers concluded that subtle exposure to concepts can unconsciously influence behavior.

Probability this replicated?
50%

Your Results

Composite Judgment Score
0
Bayesian
0/2
partial credit
Diagnostic
0/1
bias avoided
Reflection
0/2
correct
Open-Minded
0
of 24

Bayesian Reasoning

Diagnostic Reasoning

Cognitive Reflection

Open-Minded Thinking

Leaderboard Name

Choose a display name for the leaderboard. This is optional—you can stay anonymous if you prefer.

Predictions

Forecast real events. Your accuracy will be tracked and compared to your judgment score.

Instructions: For each question, drag the slider to your probability estimate. The market price is shown for reference—you're welcome to agree or disagree with it.

Predictions are scored using the Brier score when events resolve. Lower is better.

Your Predictions

No predictions yet.

Leaderboard

Tracking the correlation between judgment scores and forecasting accuracy

The Research Question

Mellers et al. (2017) found that superforecasters' judgment scores (a composite of Bayesian reasoning, diagnostic thinking, and other measures) correlated r ≈ .46-.60 with their forecasting accuracy.

We're testing whether this holds in the wild. As predictions resolve, we'll report the correlation between assessment scores and Brier scores.

Score ↔ Accuracy Correlation
Forecasters with Resolved Predictions
Questions Resolved

Top Forecasters (by Brier Score)

Lower Brier scores indicate better prediction accuracy. Scores range from 0 (perfect) to 1 (worst).

Rank User Brier Score Judgment Score Predictions
Waiting for predictions to resolve...

About

The science behind the assessment

This project tests whether laboratory measures of judgment quality predict real-world forecasting accuracy. The assessment is based on two major research programs.

The Good Judgment Project

From 2011-2015, Philip Tetlock and Barbara Mellers ran an IARPA-sponsored forecasting tournament with 5,000+ participants. "Superforecasters"—the top 2%—outperformed professional intelligence analysts by roughly 30%, even though the analysts had access to classified information.

Mellers, B. et al. (2015). Identifying and cultivating superforecasters as a method of improving probabilistic predictions. Perspectives on Psychological Science, 10(3), 267-281.

Generalizable Judgment

A follow-up study asked whether superforecasters' skills generalized to other judgment tasks. They outperformed on Bayesian reasoning (40-78% vs 5-28% for undergraduates), diagnostic test selection (77% vs 54% on congruence bias), and showed better calibration.

Mellers, B. et al. (2017). How generalizable is good judgment? A multi-task, multi-benchmark study. Judgment and Decision Making, 12(4), 369-381.
r = .46–.60
Correlation between composite judgment score and superforecaster status

This Project

We're testing whether that correlation replicates outside the lab. Participants take the assessment, make predictions on real events, and we track how judgment scores relate to forecasting accuracy as events resolve.

What We Measure

  • Bayesian Reasoning: Updating probabilities given evidence (Eddy 1982, Gigerenzer & Hoffrage 1995)
  • Diagnostic Reasoning: Pseudodiagnosticity, congruence bias, information bias (Doherty et al. 1979, Baron et al. 1988)
  • Cognitive Reflection: Frederick's CRT (2005)
  • Actively Open-Minded Thinking: Baron's AOT scale (1993, 2019)

Can Judgment Improve?

Yes. Unlike IQ, these skills appear trainable. GJP found that brief training improved accuracy by 10-15%, and superforecasters themselves improved over time. The key skills: calibration, base rate thinking, scope sensitivity, and systematic updating.

Privacy

Your data is stored with an anonymous ID. We don't collect names or email addresses. You can bookmark your ID to return and track your predictions.