After last week’s Sweet 16 and Elite 8 games, let’s revisit our match-up prediction model to see what we can learn.
Sweet 16 Picks – 100% Accuracy
Last week we provided our Sweet 16 picks based on our match-up based prediction model, and as it turned out, we were eight for eight! Using the fivethirtyeight win probabilities for these teams, the odds of getting all the games right was 23 to 1! Vegas odds would have put the odds of this success at even a longer shot.
So how can we explain this success compared to the mixed but mostly positive results from the first weekend? The answer – Sample Size of Relevant Data (SSRD). In the regular season, all tournament teams play weaker opponents, however this problem is amplified for those teams in weaker conferences since the meat of their schedule is against weaker teams. Many of these conferences only send one team from their conference and the regular season data on these teams is not as applicable to the tournament where they are facing much stronger competition.
For example, take the Buffalo team we expected to pull a first round upset against West Virginia but did not. According to the rankings, West Virginia was the 21st best team in the nation entering the tournament. However, during the regular season Buffalo only played two games against teams ranked better than the 95th overall! Even though our model considers strength of schedule, a relevant sample size of two is just too small to make any statistically confident conclusions.
So what is the most relevant data for predicting tournament outcomes? First, understanding the make-up of the tournament in terms of the quality of the teams is critical. The chart below distributes the 68 tournament teams into deciles rankings and shows the proportion of tournament teams in each decile.
As you can see 70 percent of tournament teams come from the Top 2 deciles during the regular season. So, for now, we will define the SSRD for the tournament to be games played against the top 20th percentile during the regular season.
The graph below illustrates the percentage of regular season games played by all tournament teams and Sweet 16 teams against opponents in the top deciles. As you can see, the Sweet 16 teams played a higher proportion of their regular season games against relevant competition (50%) compared to all tournament teams (38%). This leads to greater statistical confidence for our Sweet 16 picks compared to our first round picks.
We will lead with SSRD for our Final Four picks and make our predictions based only on relevant regular season data. Based on this revised dataset, below are our predictions for the Final Four and National Championship.
Looking forward to this weekend to see how these teams perform!
Many thanks to both Dave Caughman, Strategist and Drew Yao, Senior Operations Research Consultant for assisting with conducting the analysis.