Predicting the winners: How IBM uses data and AI at the US Open to drive insights and fan engagement
As the US Open’s official technology partner, IBM Consulting works with the United States Tennis Association (USTA) to turn tennis data into engaging fan insights through AI and automation.
Artificial intelligence (AI) increasingly influences people’s understanding of data, whether we are a CTO making decisions at an enterprise level or a tennis fan deciding which match to watch. For the USTA — as with all our clients — IBM is committed to delivering explainable AI, following processes that allow people to understand and trust the results and output produced by machine learning algorithms.
This process begins with solid and reliable data. At the US Open, this comprises a massive volume of structured and unstructured data from a wide variety of sources:
- Data on 128 men and 128 women players, including age, height, weight, tour ranking and recent performance
- 7 million play-based data points, including serve direction, return shot type, rally count and ball position, captured throughout the tournament
- Language and sentiment from 100 million news articles
Let’s look at how IBM combines and analyzes this data to deliver statistics and analytics in natural language, and how we break open the “black box” of AI to deliver trustworthy, explainable insights that complement outside media sources.
How the IBM Power Index analyzes player momentum
Tennis is a game of momentum. Within matches, top players use it strategically. They can serve quickly when winning or max out the serve clock to break their opponent’s rhythm. When down two breaks in a set, they may cede momentum to their opponent to preserve their energy for the next set. This strategy is particularly useful in the best-of-five men’s singles format of a Grand Slam tournament like the US Open.
But momentum is also a factor from day to day and tournament to tournament. It helps explain long winning streaks and why qualifiers sometimes make deeper runs than expected through main draws. These streaks and upsets are captured and amplified by the news media, which can in turn enhance a player’s confidence, invigorate courtside crowds and affect match outcomes.
The IBM Power Index quantifies player momentum leading into the US Open and day by day within the tournament. This index complements the official ATP and WTA rankings that are based on a 52-week rolling window.
The Power Index incorporates over 25 factors. Player performance factors include win-loss ratio, win margin, rank differential, court surface, injury status, number and level of tournaments played and round progression. These factors are interpreted by IBM Cloud® Functions, a serverless programming platform that pulls statistical data from SportRadar.
The Power Index also analyzes media sources using the natural language processing capabilities of IBM Watson® Discovery. Sentiment around player performance and health is analyzed from industry punditry and fan perspectives, drawn from hundreds of thousands of trusted news sources.
From these analyses, a series of predictive insights are generated, including the following:
- Ones to Watch: A pre-tournament view of players whom the Power Index identifies as five or more positions higher than their current tour rank.
- Upset Alerts: Matches in which the Power Index favors the lower-seeded player.
- Likelihood to Win: A win probability model for each player. This statistic can shift daily to reflect the latest performance and punditry data.
Behind Match Insights with IBM Watson
Match Insights with Watson are AI-generated fact sheets that draw on natural language processing, AI and statistical analysis in addition to the Power Index. They provide at-a-glance data and help fans understand key factors affecting win predictions. Match Insights are broken down into three sections: “In the Media,” “By the Numbers,” and new at this year’s Open, “Win Factors.”
The “In the Media” section presents qualitative player information that was extracted using Watson Discovery, as described in the Power Index section above. For use in Match Insights, extracted sentences are evaluated for grammatical coherence and topic alignment, then sent to human operators to review and approve.
The “By the Numbers” section gives fans statistical insight in the forthcoming match. It draws contrasts by highlighting the most differentiated stats between the two players, then converts these stats into natural language.
“Win Factors” reveals the reasoning behind Match Insights. It explains how the AI model works and why a player is predicted to win by listing the top three factors, such as Power Index rating, winning record on this surface and head-to-head record. Win Factors are evaluated with AI Explainability 360, an open-source toolkit for understanding machine learning models.
In tennis, trusted, explainable AI lends validity to predictions around a spectator sport. But IBM solutions for trustworthy AI are also employed in higher-stakes environments. For example, Neighborhood Trust Financial Partners, which helps workers reduce and avoid predatory debt and build savings, uses the AI Explainability 360 toolkit to provide transparency and explainability on how AI algorithms will respond to their financial decisions.
IBM is using AI to transform data into insight in a variety of industries — helping Lufthansa agents address 100,000 customer queries a year, informing public health decisions for the state of Rhode Island and helping Citi auditors skip thousands of hours of manual transcription. Find out how you can unlock the full potential of your organization’s data with IBM.