Methodology
How Oracle Arena measures predictions, drawn from Philip Tetlock's Good Judgment Project.
What we're scoring
A good forecast is one that is both accurate and well-calibrated. Tetlock's research shows that, given training and feedback, forecasters can substantially improve at probabilistic prediction — and that the best of them outperform subject-matter experts.
The Brier score
Every binary prediction is a probability between 0 and 1. When the question resolves, the Brier score is the squared error.
Brier = (p − outcome)² where outcome ∈ {0, 1}Example: you say 0.62 the Lakers make the playoffs; they do. Brier = (0.62 − 1)² = 0.144. Lower is better; 0 is perfect.
Multiple-choice Brier
A multiple-choice prediction is a probability vector that sums to 1. The Brier sums squared errors across all options.
Brier = Σᵢ (pᵢ − oᵢ)² where o is one-hot
Example: three candidates, you predict (0.5, 0.3, 0.2). Candidate 1 wins. Brier = (0.5−1)² + (0.3−0)² + (0.2−0)² = 0.25 + 0.09 + 0.04 = 0.38.
Numeric interval score
A numeric prediction is a 90% confidence interval [L, U]. The score rewards narrow intervals but heavily penalises the truth falling outside.
Score = (U − L)
+ (2/α) · max(0, L − y)
+ (2/α) · max(0, y − U)
α = 0.10 (90% interval)Example: you predict [90, 110] for the truth, which turns out to be 100. Score = 20. If the truth were 80: 20 + (2/0.1)·10 = 220.
Gut Brier vs. Updated Brier
Your initial prediction — submitted within 24 hours of round lock — is frozen and used for your Gut Brier. Every subsequent update you submit until the question resolves feeds your Updated Brier, which is a time-weighted average of every probability you held while the question was open.
Updated Brier = (1 / (T − t₀)) · Σᵢ (tᵢ₊₁ − tᵢ) · Brier(pᵢ, outcome) where t₀ is your first submission, T is resolution time
Gut measures your call; Updated measures your updating.
Calibration
Calibration asks: when you say 70%, does it happen 70% of the time? Your profile bins your binary predictions into 10% buckets and plots empirical frequency against stated probability. A perfectly calibrated forecaster's curve sits on the diagonal.
Resolution and disputes
Source-anchored questions resolve when any member confirms the linked source. Peer-verified questions resolve when the named resolver does. Every resolution is provisional for 48 hours, during which any member may dispute. Disputes gather votes from the group; plurality wins, ties leave the original outcome.
Glossary
- Brier score
- Squared error of probability vs. outcome.
- Calibration
- Whether stated probabilities match empirical frequencies.
- Base rate
- The unconditioned prior probability of an outcome.
- Bayesian update
- Revising a probability in light of new evidence.
- CI
- Confidence interval — here, the 90% interval bounds.
- Resolver
- The member named to record a peer-verified outcome.
- Co-sign
- A second member's affirmation that a proposal is worth predicting.