Kaggle Brackets
Kaggle runs an annual March Madness competition where competitors use data science to make predictions about the NCAA tournament. It’s a little deeper than the standard bracket, though, since (in true stastistical fashion) we assess the probability of a particular outcome for each game. Once the game is played, your score is assessed using the Brier score. This web page allows you to visualize the Brier score throughout your Kaggle submission.
You can easily view the process by choosing an example from the Examples menu. The bracket should appear with the games shaded according to the quality of the predictions. Green is good, red is bad, and almost white is near the middle. You can hover over the games form more information. You can also project to see how the predictions think the tournament might play out; in that case, the games will be shaded by confidence in the prediction.
If you have your own submission for any year for which Kaggle provides the result data, then you can click the “Browse” button to upload it. I’ll try to keep the current 2025 year data up to date during the tournament.
All the data comes from Kaggle’s competition. If you simply want to explore the tournaments, you can step up a page.
The bracket
Logistic Eigenratings
The predictions in the example files are all produced using a technique I call Logistic Eigenrating. We first rate the teams using a technique very similar to Google’s PageRank algorithm. Since we’ve got 50 years of data for the men’s tournament and 28 years of data for the women, we can apply logistic regression predict future outcomes from those ratings.
The technique works reasonably well, though I don’t think it will ever be a money maker in the Kaggle competition. It’s well motivated and seems like a reasonable baseline.