3. Modern data architecture in Azure – Case study: WTA Insights

Modern data platforms, are capable of handling today’s data challenges easily and effectively. Azure data platform comprise a wide set of services and technologies providing the necessary functionality to handle high volume and high variety of data, at all types of velocities, while ensuring high scalability, availability and performance.

Traditional data pipelines use the ETL process to Extract, Transform and Load data, but has the major shortcoming of being ineffective in processing large amounts of data.

The massive and continuous flow of data, from a variety of sources, as well as the emphasis on data accuracy and value have driven the development of cloud data platform technologies. This has led data engineers to adopt a new approach to data processing – the ELT (Extract-Load-Transform) approach.

In traditional data warehousing, data is extracted from various sources, modeled and transformed in the right format and then loaded into a data repository for historical reporting and data analysis. This is often referred to as a top-down approach.

With big data, the focus is on capturing data – any kind of data, from any source – and storing it for later analysis. There is no need to define data structure or any relationship at this point. We want to make sure that data is captured in a storage area (date lake) and persisted until it is needed. Once all data is brought in the data lake, we can pull relevant data, model it and analyze it whenever needed. This is often referred to as a bottoms-up approach.

Big data and data warehouse are two different solutions. An organization can implement either one or a combination of the two, depending on their need.

Case study: WTA Insights

Now the fun part – let’s look at the proposed architecture for WTA Insights!

Source: the tennis_wta dataset is comprised of several csv files corresponding to matches per year, player file and historical rankings. The csv files are located in github repository.

Architecture

We design the Azure data project in phases that reflect the ELT approach. We identify the technologies and services associated with each of the following phases: Ingest, Store, Prep and Train, Model and Serve, Consume.

Diagram: Big data/data lake/ELT/modern data warehouse architecture on Azure (image by author)

Data flow

Get the csv files from GitHub using Azure Data Factory and save them in Azure Data Lake Storage. Use notebooks in Azure Databricks to clean and transform data.

There are different paths we could take further:

One option is to load transformed data into Azure Synapse (formerly known as Azure SQL Data Warehouse) and build the Power BI reports on top of it. This is the option you would go for in big enterprise solutions.

For the WTA Insights project we will write the transformed data back in the data lake store in a clean folder and do the modeling part in Power BI instead. This works very well for individual solutions with smaller workloads and WTA Insights is a good use case.

Components

We will use the following Azure products:

Azure Data Lake Storage – Massively scalable and secure data lake for high performance analytics workloads.

Azure Data Factory – Fully managed, serverless data integration solution to create, schedule and orchestrate ETL and ELT workflows.

Azure Databricks – Fast, easy and collaborative Apache Spark based analytics service.

Power BI – A suite of business analytics tools that deliver insights from hundreds of data sources, simplify data prep, and drive ad hoc analysis. Publish reports and consume them on the web and across mobile devices.

So what’s next? We are going to build WTA Insights from scratch. The blog is organized further in step-by-step tutorial series that will walk you through each Azure product in the suggested solution.

What’s next
Check the next post for a list of prerequisites to get things started.🐳

Want to read more?
Microsoft learning resources and documentation:
Azure data platform end-to-end
Big data architectures
Modern data warehouse architecture