You like data, and you like Azure and you want to learn more about data engineering with Azure. As a data engineer you will work in different domains across your career and the way you process data and look into data will vary as business goals vary per industry.
Finance, insurance, healthcare, entertainment, retail, sports, media – there are so many interesting domains to choose from. Picking a topic that you actually care about will help you go the extra mile in your leaning curve.
Domain expertise is the kind of skill set you learn as you get industry experience, but it is as important as your technical expertise. Be proactive on your learning journey and do data exploration projects on different domains.

Project background
Should come to no surprise that for this end-to end data engineering project I chose a subject that is dear to me: women’s tennis.
Women’s tennis is so enjoyable to watch! Billie Jean King, Steffi Graf, Serena Williams, Simona Halep, Angelique Kerber, Ana Ivanovic, Bianca Andreescu – and the list goes on and on. We could write an entire blog about so many inspiring women in tennis! 🎾🏆
The Women’s Tennis Association (WTA) is the principal organizing body of women’s professional tennis. On it’s website, it provides information about tournaments, rankings, players and stats.
As with any data engineering project, we need data to work with. Jeff Sackman has done a wonderful job, scraping the WTA’s website and collecting the available data resources into an extensive database of match results. We are going to use his tennis_wta database as demo dataset in our data engineering project.
Jeff Sackman‘s tennis_wta database is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Please check License section on the website menu above for more information.
WTA Insights will provide insights about matches and players through analytical dashboards and reports in Power BI.
Currently, the tennis_wta dataset comes in the form of csv files available for download on github.
Requirements
- Data must be stored in Azure and made available for analysis in Power BI.
- Explore / choose data store and analytical data store options.
- Data will be loaded in Azure using Azure Data Factory.
- Clean and transform csv files using Azure Databricks.
What’s next
Let’s look at modern data architecture in Azure and how it applies to the WTA Insights project.🐳
Want to read more?
Wome’s Tennis Association
Billie Jean King’s story in and off the court will truly inspire you to persevere.
Public datasets you can use in your projects: awesomedata