top of page

Titan Analytics predicts March Madness

For the past six years, Kaggle, a data science organization that hosts online competitions, has

partnered with the NCAA and Google Cloud to help budding statisticians and data scientists sift

through the madness of March. This year, 871 competitors across the nation have put their

basketball knowledge and data skills to the test.

The 2019 Google Cloud & NCAA Machine Learning Competition challenges all who enter to

construct a machine learning model that can predict the most accurate March Madness bracket.

Novices and experts alike will go at it for the next week. Some will enter as individuals while

others will be representing analytics companies.

In the case of Ethan Cohen and Jake Barbieri, these two students will be representing the

Cornell-based statistical analysis company Titan Analytics. Together, the pair worked to develop

a model trained with NCAA basketball statistics from the 2018-2019 regular season and every

NCAA tournament dating back to 2003. Based off this data, a team’s win percentage is

produced. A prediction is made for every possible game between any two teams in the pool of 64


In addition to factoring in stats like field goal percentage and offensive rebounds, their model has introduced a new statistic: Disruptions. The disruption factor takes the offensive team’s

rhythm/designed play into account. This stat addresses all possible actions that could disrupt the rhythm of a play. Tipped passes that are still completed, random loose balls, and blocked shots retained by the shooting team are all coded as disruptors.

Ethan and Jake’s model peaked when it broke into the top 100 models in the competition, on par

with some of the best sports analytics minds across the country. Currently, their model has been

73% accurate. We will see how the rest of March treats the two data scientists/students/



bottom of page