Databricks lakehouse a secret weapon for WNBA’s Fever

by Pelican Press
2 views 15 minutes read

Databricks lakehouse a secret weapon for WNBA’s Fever

Caitlin Clark and the Indiana Fever were a significant part of what was one of the most electrifying seasons in Women’s National Basketball Association history.

Databricks and its lakehouse platform for data management and analytics, conversely, was a key part of what helped the Fever to their most successful season since 2016.

Clark, who recently finished a record-setting rookie season with the Fever, is perhaps best known for draining long-range shots and making spectacular passes. Her 769 total points (19.2 per game) and 337 assists (8.4 per game) set league records for a rookie — so did her 223 turnovers. And she helped the Fever improve from 13-27 in 2023 to 20-20 in a 2024 season that ended last Sunday night with the New York Liberty topping the Minnesota Lynx for the Women’s National Basketball Association (WNBA) championship.

But her performance, and the Fever’s success, weren’t mere happenstance. They weren’t just the result of a generational player joining the team and the team becoming more successful.

In the background of both Clark’s success as well as the Fever’s overall improvement was Databricks’ lakehouse platform, which enabled the Fever to view and analyze data in ways previously impossible.

In a sense, Clark’s arrival this past season was fortuitous timing.

Pacers Sports & Entertainment — owner of the Fever along with the NBA’s Indiana Pacers, Gainbridge Fieldhouse where the Fever and Pacers play, the Indiana Mad Ants of the NBA’s G League and minor league baseball’s Reno Aces — only started using Databricks in April 2022.

Before that, the organization’s data was an untidy muddle of myriad systems and isolated data, according to Jared Chavez, Pacers Sports and Entertainment’s manager of data engineering and strategy.

“There was no crossover,” he said.

Clark’s arrival certainly would have had some impact on the Fever’s success regardless of how well the team could analyze data and use it to inform its on-court performance.

But perhaps it wouldn’t have been quite as significant. Perhaps a couple of close games would have gone against the Fever instead of for them had the Fever not been making data-informed decisions throughout the game leading up to the drama at the end. Perhaps what was the Fever’s first playoff appearance and .500 season since 2016 would have been their eighth straight losing one.

From problem to solution

Before making Databricks’ lakehouse platform the hub for its entire data management and analytics operations in April 2022, Pacers Sports and Entertainment’s most important franchises — the Pacers and Fever — suffered from isolated data.

The Pacers’ basketball-related data was kept on premises while business-related data such as sales and marketing was stored in a cloud-based data warehouse. The Fever, meanwhile, were not yet analyzing on-court data while their business-related data was kept in multiple Microsoft Azure Synapse databases that weren’t integrated.

“Every single department had data living in one of our warehouses that was cut off from the rest of them, or they didn’t have data flowing into anything,” Chavez said

In addition to an inability to integrate data, Pacers Sports & Entertainment’s data operations were costly given the disparate systems for which the organization was paying.

To better enable its teams to compete with other franchises on the court, the organization needed a data ecosystem rather than a series of isolated tools. To improve its financial bottom line, it needed to get a better return on its investment from its data operations.

The company decided to start over on a new system.

A basketball falls through the net.
Pacers Sports and Entertainment, parent company of the NBA’s Pacers and WNBA’s Fever, is using Databricks to fuel both its on-court data analysis as well as it enterprise analytics operations.

In retrospect, Databricks seems like a logical choice for Pacers Sports & Entertainment. One of the advantages of the vendor’s lakehouse platform is that it was designed to enable large organizations to centralize their data, both structured and unstructured, to enable integration, according to Alexander Booth, a solutions architect at Databricks.

Sports franchises collect not only large amounts of traditional structured data such as financial records and point-of-sale transactions on the business side but also petabytes of unstructured data such as motion capture data on the performance side. Even on the performance side, there is structured data that includes traditional statistics such as home runs in baseball, points in basketball, or goals in hockey and soccer that needs to be integrated with unstructured data to develop a more complete view of what is taking place during competition.

As a result, Databricks makes sense for an organization like Pacers Sports & Entertainment.

“Databricks excels in sports analytics due to its capacity to handle large volumes of diverse data types in real time,” Booth, who previously was assistant director of baseball R&D for Major League Baseball’s Texas Rangers, said. “This includes … complex data sources such as biomechanics, videos and detailed reports with natural language [that can be integrated to feed] machine learning and AI models.”

Despite the data integration enabled by Databricks’ lakehouse, the platform was not the first choice for Pacers Sports & Entertainment in 2022. Chavez, who joined Pacers Sports and Entertainment in early 2022, was familiar with Snowflake, perhaps Databricks’ closest rival.

Starting with Snowflake, however, required a bigger initial investment than Pacers Sports and Entertainment wanted to make, according to Chavez. Databricks, meanwhile, offers pay-as-you-go pricing, so based on finances as much as anything else, the organization decided to migrate its data operations to the vendor’s lakehouse platform.

“At the time, it was a money decision,” Chavez said. “We were hemorrhaging on our old infrastructure, and we didn’t have the money to go with what I was familiar with at the time, which was Snowflake. Looking back, I’m glad we didn’t.”

Data analysis with Databricks

Pacers Sports & Entertainment is not yet using Databricks to analyze the Pacers’ on-court performance.

It is, however, using Databricks’ lakehouse platform to store, integrate and analyze most of the organization’s other data. In total, in the 30 months since Pacers Sports & Entertainment scrapped its inefficient data management and analytics operations, the company has either migrated to or rebuilt on Databricks more than 60 systems.

When Pacers Sports & Entertainment first started using Databricks for some of its data operations, the results were almost immediate, according to Chavez.

Most significantly, given that the company made the move from myriad systems to Databricks — at least, in part, for financial reasons — migrating certain operations to Databricks’ lakehouse led to a 70% reduction in cost compared to the way those operations were run previously.

Those cost savings led to buy-in from organizational executives. Buy-in subsequently resulted in expanded use of Databricks and, eventually, the complete overhaul of the organization’s data management and analytics operations to make the Databricks lakehouse its hub.

“That’s what ended up snowballing and leading us to stick around to see how it goes [with Databricks],” Chavez said.

Now, though the Pacers aren’t yet benefiting from the organization’s migration to Databricks, the Fever are.

The WNBA uses a tracking system from Second Spectrum to provide its franchises with motion capture data that enables teams to analyze the movements of every player at every instant they’re on the court by essentially photographing each player 30 times per second.

Using that data in concert with the right platform, teams can go far beyond basic statistics — points, rebounds, assists, points in the paint and points from beyond the 3-point arc — that are rear-facing and only show what happened to ask and understand why something happened.

The Fever loads its Second Spectrum data into Databricks’ lakehouse. With Databricks as its hub for data management and data engineering, the team is able ask why things are happening on the court and analyze motion capture data to reach conclusions.

For example, the team wanted to know not only how many touches near the basket it wasn’t converting into points but also why those possessions right in the paint weren’t resulting in baskets. Another question it had was how many possessions started with a steal but weren’t converted into fast-break points and why those possessions didn’t result in an easy basket.

“We had a lot of questions come through that were like, ‘Why are we the lowest in the league in this thing or in the bottom half of the league in some other specific area?'” Chavez said. “Now we can answer those questions.”

We had a lot of questions come through that were like, ‘Why are we the lowest in the league in this thing or in the bottom half of the league in some other specific area?’ Now we can answer those questions.
Jared ChavezManager of data engineering and strategy, Pacers Sports & Entertainment

That’s just the start, he continued. The WNBA only began providing its teams with Second Spectrum data about two weeks before the start of the 2024 season. That gave the Fever and other franchises little time to develop models using the motion capture data to ask questions and analyze on-court performance.

Beyond motion capture data, Databricks is the Fever’s system for feeding data to television and radio commentators to inform broadcasts and developing more traditional data assets, such as pregame and postgame reports for coaches and players, among other things.

Included is refined data such as missed opportunities for challenging bad calls, long 2-point shots that could have turned into 3-pointers had a player stepped back, and missed boxouts for rebounds as well as traditional statistics such points, rebounds, assists, turnovers and field goal percentage.

“The coaches are the heaviest users by a long shot,” Chavez said. “The questions that we’re seeing being asked and being relayed to my team are about that more nuanced stuff like, ‘Why is it that when teams drive the middle [of the lane], we’re not stopping it?'”

Databricks, meanwhile, plays a role in assisting the Fever and other customers with unique data management and analytics needs, working with them to train personnel on the platform as well as develop the data and machine learning tools that inform decisions, according to Booth.

“Databricks offers … the power to rapidly build custom AI and machine learning models that can reason on sports teams’ unique data,” he said. “Sports teams typically seek quick and cost-effective data insights, often with lean teams. Databricks partners with franchises to decrease time to insight while empowering upskilling.”

Past problems

Pacers Sports & Entertainment realized immediate savings after switching its data operations to Databricks and has since used the lakehouse platform for multiple purposes, including better understanding the performance of the Fever. But all wasn’t perfect from the start, according to Chavez.

Databricks has long been good at enabling the use of analytics and machine learning to inform decisions, he noted. But from an engineering perspective, it wasn’t as strong in 2022 as some other platforms.

“The orchestrator was a far cry from most things on the market at the time — though, granted, it was their attempt to start getting headway,” Chavez said. “It was choppy for a bit.”

A significant development that made engineering easier was the Databricks Unity Catalog, a data catalog that enables organizations to organize and govern their data.

“They have found their stride in the last year or year-and-a-half, especially on the engineering front,” Chavez said. “The tools continue to improve, and things are getting a lot better. But there are still some quality-of-life fixes that [Databricks needs to make] to catch up to platforms that have been around [longer].”

One remaining issue for many users is needing to use Apache Spark to work with Databricks’ lakehouse platform, he continued. For those who have used data warehouses in the past and are familiar with traditional databases, Databricks is completely foreign.

However, once users learn Spark, the benefits far outweigh the negatives associated with any initial struggles.

“The learning curve and ceiling is astronomical for this platform, especially if you’re coming from a traditional background,” Chavez said. “You don’t have to know Spark to use Databricks. But if you don’t, you’re doing yourself a disservice because you can make this platform stupidly efficient and incredibly powerful.”

Databricks, meanwhile, has introduced tools over the past year aimed at making its platform easier to use, according to Bryan Saftler, global industry leader at Databricks.

They include a AI-powered tools that enable conversational interactions with data and free access to Databricks Academy, a set of training resources to help new users learn needed skills.

“Databricks has always been committed to democratizing access to more individuals across organizations whether they are technical nor not,” Saftler said. “To this effect, Databricks takes a multi-pronged approach.”

Looking to the future

With problems mostly in the past for Pacers Sports and Entertainment, Databricks’ lakehouse is now the system underpinning most of the organization’s data operations, including its Salesforce ecosystem, security planning, marketing strategy and on-court analysis.

Looking ahead, the organization has ambitious plans for its future use of the vendor’s platform, including developing virtual reality models, Chavez said.

The organization captures location data as fans move throughout Gainbridge Field House and the property that surrounds it. Using that data, and by partnering with a research lab at Indiana University, Pacers Sports & Entertainment is planning to develop a virtual three-dimensional model of the Field House’s campus that shows the flow of traffic during events.

With the model, analysts will be able to see what fans are seeing to best strategize where to place advertising. They will also be able to observe where there are choke points in the flow of traffic to decide what gates to open when and where to place security personnel.

Regarding on-court analysis, one application of virtual reality will be to enable the Fever to not only analyze the movement of players but also view their movement from different angles.

One example is if there is action taking place close to the basket where a bunch of bodies are crowded together and it’s difficult to tell exactly what’s taking place, virtual reality will enable an analyst to shift the viewing perspective. Using a simulation, the analyst will be able to essentially go inside the scrum to better see the action.

Another application of virtual reality will be to help the Fever better observe games from afar, such as action in overseas leagues.

Scouts can’t be everywhere, so the Fever are collecting data from over 50 competitions worldwide that will feed virtual reality models that enable the Fever to see more than just game tape as they look for players to add to their roster. The virtual reality models will also let coaches and front-office staff track current Fever players who spend winters playing overseas. The WNBA season is only from May to October, and the average salary of $102,000 is dramatically lower than the average NBA salary of $9.6 million, so players often spend winters in overseas leagues.

“The goal is to simulate the game so when you say the team or a player is good at something or not good at something, we can see what it actually looks like,” Chavez said. “Now you can know what it looks like outside of just the stats.”

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.



Source link

#Databricks #lakehouse #secret #weapon #WNBAs #Fever

You may also like