Methodology

How Oracle Arena measures predictions, drawn from Philip Tetlock's Good Judgment Project.

What we're scoring

A good forecast is one that is both accurate and well-calibrated. Tetlock's research shows that, given training and feedback, forecasters can substantially improve at probabilistic prediction — and that the best of them outperform subject-matter experts.

The Brier score

Every binary prediction is a probability between 0 and 1. When the question resolves, the Brier score is the squared error.

Brier = (p − outcome)²   where outcome ∈ {0, 1}

Example: you say 0.62 the Lakers make the playoffs; they do. Brier = (0.62 − 1)² = 0.144. Lower is better; 0 is perfect.

Multiple-choice Brier

A multiple-choice prediction is a probability vector that sums to 1. The Brier sums squared errors across all options.

Brier = Σᵢ (pᵢ − oᵢ)²   where o is one-hot

Example: three candidates, you predict (0.5, 0.3, 0.2). Candidate 1 wins. Brier = (0.5−1)² + (0.3−0)² + (0.2−0)² = 0.25 + 0.09 + 0.04 = 0.38.

Numeric interval score

A numeric prediction is a 90% confidence interval [L, U]. The score rewards narrow intervals but heavily penalises the truth falling outside.

Score = (U − L)
        + (2/α) · max(0, L − y)
        + (2/α) · max(0, y − U)

α = 0.10  (90% interval)

Example: you predict [90, 110] for the truth, which turns out to be 100. Score = 20. If the truth were 80: 20 + (2/0.1)·10 = 220.

Gut Brier vs. Updated Brier

Your initial prediction — submitted within 24 hours of round lock — is frozen and used for your Gut Brier. Every subsequent update you submit until the question resolves feeds your Updated Brier, which is a time-weighted average of every probability you held while the question was open.

Updated Brier = (1 / (T − t₀)) · Σᵢ (tᵢ₊₁ − tᵢ) · Brier(pᵢ, outcome)

where t₀ is your first submission, T is resolution time

Gut measures your call; Updated measures your updating.

Calibration

Calibration asks: when you say 70%, does it happen 70% of the time? Your profile bins your binary predictions into 10% buckets and plots empirical frequency against stated probability. A perfectly calibrated forecaster's curve sits on the diagonal.

Resolution and disputes

Source-anchored questions resolve when any member confirms the linked source. Peer-verified questions resolve when the named resolver does. Every resolution is provisional for 48 hours, during which any member may dispute. Disputes gather votes from the group; plurality wins, ties leave the original outcome.

Glossary

Brier score: Squared error of probability vs. outcome.
Calibration: Whether stated probabilities match empirical frequencies.
Base rate: The unconditioned prior probability of an outcome.
Bayesian update: Revising a probability in light of new evidence.
CI: Confidence interval — here, the 90% interval bounds.
Resolver: The member named to record a peer-verified outcome.
Co-sign: A second member's affirmation that a proposal is worth predicting.