For the past six years, Kaggle, a data science organization that hosts online competitions, has
partnered with the NCAA and Google Cloud to help budding statisticians and data scientists sift
through the madness of March. This year, 871 competitors across the nation have put their
basketball knowledge and data skills to the test.
The 2019 Google Cloud & NCAA Machine Learning Competition challenges all who enter to
construct a machine learning model that can predict the most accurate March Madness bracket.
Novices and experts alike will go at it for the next week. Some will enter as individuals while
others will be representing analytics companies.
In the case of Ethan Cohen and Jake Barbieri, these two students will be representing the
Cornell-based statistical analysis company Titan Analytics. Together, the pair worked to develop
a model trained with NCAA basketball statistics from the 2018-2019 regular season and every
NCAA tournament dating back to 2003. Based off this data, a team’s win percentage is
produced. A prediction is made for every possible game between any two teams in the pool of 64
schools.
In addition to factoring in stats like field goal percentage and offensive rebounds, their model has introduced a new statistic: Disruptions. The disruption factor takes the offensive team’s
rhythm/designed play into account. This stat addresses all possible actions that could disrupt the rhythm of a play. Tipped passes that are still completed, random loose balls, and blocked shots retained by the shooting team are all coded as disruptors.
Ethan and Jake’s model peaked when it broke into the top 100 models in the competition, on par
with some of the best sports analytics minds across the country. Currently, their model has been
73% accurate. We will see how the rest of March treats the two data scientists/students/
Comentarios