Brier-Calibrated Success Score: What 78% Calibration Actually Means

Frequently Asked Questions

What is the Brier score?
The Brier score is a measure of how well-calibrated probabilistic predictions are. It compares the predicted probability of an outcome with whether the outcome actually happened. A perfect Brier score is 0. A worst-case score for a binary outcome is 2. Lower scores mean better calibration.
Why does Tactiq show calibration as a percentage instead of the raw Brier score?
Raw Brier scores between 0.20 and 0.70 do not communicate intuitively. Most fans read 'lower is better' as counterintuitive. Tactiq reverse-maps the Brier score to a 0 to 100 percent calibration score, where 100 means perfect calibration and 0 means random guessing. The reverse-mapping uses the formula round((1 - brier / 2) * 100), clamped to 0 to 100.
How is calibration different from accuracy?
Accuracy asks whether the prediction was right. Calibration asks whether the predicted probability matched reality. If you predict 60 percent home win 100 times and the home side wins 60 of them, you are perfectly calibrated even though 40 of your predictions were 'wrong'. Calibration is the deeper measure of probabilistic skill.
Why does Tactiq require 10 decided analyses before showing a calibration score?
With fewer than 10 decided fixtures, the Brier score swings sharply between matches. A single lopsided wrong call can move the score by 0.10 or more. The 10-match threshold gives a stable estimate. Below 10, Tactiq shows a 'not enough decided analyses yet' state instead.
What calibration percentage should I aim for?
The four tiers in Tactiq are: 85 percent and above (very good), 75 to 84 percent (good), 65 to 74 percent (average), under 65 percent (needs work). Most users land in the 75 to 84 percent range after 50 plus decided fixtures. Above 85 percent is genuinely strong calibration and is unusual.
Does the calibration score include simulator outputs?
No. The calibration score is computed only on base analyses, not on simulator outputs that depend on user-provided overrides. Including overrides would conflate model calibration with the user's judgment about the override inputs.