Methodology

How Oracle Arena measures predictions, drawn from Philip Tetlock's Good Judgment Project.

What we're scoring

A good forecast is one that is both accurate and well-calibrated. Tetlock's research shows that, given training and feedback, forecasters can substantially improve at probabilistic prediction — and that the best of them outperform subject-matter experts.

The Brier score

Every binary prediction is a probability between 0 and 1. When the question resolves, the Brier score is the squared error.

Brier = (p − outcome)²   where outcome ∈ {0, 1}

Example: you say 0.62 the Lakers make the playoffs; they do. Brier = (0.62 − 1)² = 0.144. Lower is better; 0 is perfect.

Multiple-choice Brier

A multiple-choice prediction is a probability vector that sums to 1. The Brier sums squared errors across all options.

Brier = Σᵢ (pᵢ − oᵢ)²   where o is one-hot

Example: three candidates, you predict (0.5, 0.3, 0.2). Candidate 1 wins. Brier = (0.5−1)² + (0.3−0)² + (0.2−0)² = 0.25 + 0.09 + 0.04 = 0.38.

Numeric interval score

A numeric prediction is a 90% confidence interval [L, U]. The score rewards narrow intervals but heavily penalises the truth falling outside.

Score = (U − L)
        + (2/α) · max(0, L − y)
        + (2/α) · max(0, y − U)

α = 0.10  (90% interval)

Example: you predict [90, 110] for the truth, which turns out to be 100. Score = 20. If the truth were 80: 20 + (2/0.1)·10 = 220.

Gut Brier vs. Updated Brier

Your initial prediction — submitted within 24 hours of round lock — is frozen and used for your Gut Brier. Every subsequent update you submit until the question resolves feeds your Updated Brier, which is a time-weighted average of every probability you held while the question was open.

Updated Brier = (1 / (T − t₀)) · Σᵢ (tᵢ₊₁ − tᵢ) · Brier(pᵢ, outcome)

where t₀ is your first submission, T is resolution time

Gut measures your call; Updated measures your updating.

Calibration

Calibration asks: when you say 70%, does it happen 70% of the time? Your profile bins your binary predictions into 10% buckets and plots empirical frequency against stated probability. A perfectly calibrated forecaster's curve sits on the diagonal.

Resolution and disputes

Source-anchored questions resolve when any member confirms the linked source. Peer-verified questions resolve when the named resolver does. Every resolution is provisional for 48 hours, during which any member may dispute. Disputes gather votes from the group; plurality wins, ties leave the original outcome.

Glossary

Brier score
Squared error of probability vs. outcome.
Calibration
Whether stated probabilities match empirical frequencies.
Base rate
The unconditioned prior probability of an outcome.
Bayesian update
Revising a probability in light of new evidence.
CI
Confidence interval — here, the 90% interval bounds.
Resolver
The member named to record a peer-verified outcome.
Co-sign
A second member's affirmation that a proposal is worth predicting.