Grand Slam Tennis Situational Analysis using Spark
The IBM jStart team developed and deployed a set of REST APIs on IBM Bluemix to enable programmatic upload of new historical tennis match data as well as to allow IBM SlamTracker application to access the results in real time. Analytics processes running on Spark allowed us to compute “pressure situations” statistics such as break point situations or if a player is down in a game 0-30. The new pressure situation metrics functionality first became available for IBM SlamTracker application users during US Open 2016 as illustrated in the image below:
When following a sporting event one inevitably gets immersed into the world of numbers and statistics. Past scores, rankings, winning percentage, all-time records, yards run, passes made, assists, home runs – everything is quantified and recorded. Why is it so important? There are a number of reasons. Enabling deeper immersion into the sport, understanding performance of the athletes and teams, providing frame of reference for ratings – just to name a few. With various degrees of success, statistics are also being used for predicting future performance.
Tennis is no exception. There is past history of all matches played by every professional player. There are also a lot of parameters in every match, which get tracked: scores, serves, breakpoints, unforced errors, double-faults, etc. Professionals and amateurs alike analyze this information to create commentary, insights and predictions. There are sophisticated tools for viewing and analyzing these statistics, for instance, IBM SlamTracker (used by every Grand Slam tournament for the past 20 years). IBM SlamTracker not only supplies statistics for past matches, but also identifies most important “keys to match” for every player before a competition, thus helping viewers to focus on the most relevant aspects of each player’s performance.
One area, which is not being addressed as much, is situational analysis. That is, analysis of changes in a player’s behavior as the match progresses. Players face multiple pressure situations during the match, such as breakpoints, losing their serve, being down in a set, etc. How do they cope with these situations? Do they alter their normal behavior? Serve faster or slower? Use different kind of serves? Go to the net more often? Do they play better or worse in certain situations? What are their chances of getting out of these situations with a win? Does it matter who they are playing against?
IBM jStart team has been working with the IBM Interactive Experience (IBMiX) team who developed the IBM SlamTracker experience to address these types of questions and add support for situational analysis into the IBM SlamTracker application.
Our approach was simple:
- Obtain historical data about players’ performance at all Grand Slam matches since 2005.
- Identify types of critical situations, which may occur during matches.
- Identify a set of parameters in player behavior, which may get altered during these situations.
- For every player, and every situation, look at all past data and calculate:
- Probabilities for various outcomes (in a current game, set or match)
- Potential deviations from normal behavior for every one of the behavioral parameters.
- Add critical situation triggers to the IBM SlamTracker application and display calculated outcome probabilities to the viewers in real time when such situations occur.
Thanks to the continuous IBM involvement in Grand Slam tournaments, the data was readily available covering: 9,623 matches, 28,479 sets, 296,936 games, 1,725,931 points played by 930 players during the last 46 Grand Slam tournaments. It was more than enough!
We then picked a set of most interesting critical situations. The list included the situations such as:
- Being down N-M by sets in a match (for instance, 1-2 or 0-2)
- Being down N-M by games in a set (for instance, 0-3 or 2-4)
- Being down N-M in a game (for instance, 0-30 or 30-40)
- Breakpoint situation (either having opportunity to win it or at risk of loosing it)
- Mini-Breakpoint situation during a tiebreak
- Serving after failing to convert a breakpoint
For every one of these situations and for each player we wanted to know the probability of winning the game, set or match.
Next, we chose a number of behavioral parameters to focus on:
- (*) Direction of a serve (wide, center or body)
- (*) Speed of a serve
- Length of a rally
- Unforced errors
(*) – Included into the first release
For each one of these parameters we intended to look for potential deviations from players’ normal behavior. And on top of all that we also decided to check whether player’s behavior changed when facing different groups of opponents (top seeded players, bottom seeded players, unseeded players; top 5 ranked players, top 10 ranked players, etc.).
Not all of the chosen critical situations and behavioral parameters made it into the first release. Even with a limited selection, the number of calculations to be analyzed was daunting. Especially, taking into consideration that during the tournaments we needed to re-run all calculations on a daily basis in order to take advantage of new data coming in.
We decided to use IBM Bluemix Apache Spark service to solve this problem. We used Scala to create a Spark job that performed the calculation. The Spark job was submitted to a small IBM Bluemix Spark cluster once a day using a scheduler. The Spark job pulled all historical data from PostgreSQL DB (another IBM Bluemix service), performed required calculations and stored the results back into the PostgreSQL DB. It worked perfectly: it did not take a long time for the calculations to finish. Not only was the current workload processed quickly, but now we also had room to grow. When new situations and behavioral parameters were added, we could easily scale the cluster avoiding increases in the computation window.
Finally a REST API was developed and deployed on IBM Bluemix to enable programmatic upload of new historical data as well as to allow IBM SlamTracker application to access the results in real time.
The IBMiX team added triggers for some of the critical situations into the applications, and developed presentation layer for the results. The new functionality became available for IBM SlamTracker application users during US Open 2016.
Sample visualization inside IBM SlamTracker applications for US Open 2016
Learn more about the Cognitive Insights at the 2016 US Open Tennis Tournament. Read more about other emerging technology projects on the IBM jStart website.