belugaboba

1. Provisioning Azure Data Factory

The next piece of the puzzle is to fetch the csv files from the tennis_wta repository on github. For that we need to prepare another Azure resource – a data factory. On the Azure portal, select + Create a resource, in the upper left-hand corner, then do a quick search for data factory. On theContinue reading “1. Provisioning Azure Data Factory”

5. Data lake zoning

We use namespaces or containers to organize data in the data lake. Inside containers we have a hierarchical structure of files and folders similar to the traditional OS file system. In fact, containers in ADLS gen 2 are also referred to as a file system. The file system driver that ADLS gen2 utilizes – AzureContinue reading “5. Data lake zoning”

4. Setting things up – Azure Portal

First things first, if you haven’t done so already, you will need to create your free Azure account to get started. You get an initial 200$ Azure credit and 30 days to use it on Azure services. Beyond that, Azure is pay as you go service, so make sure to make the most out ofContinue reading “4. Setting things up – Azure Portal”

3. Modern data architecture in Azure – Case study: WTA Insights

Modern data platforms, are capable of handling today’s data challenges easily and effectively. Azure data platform comprise a wide set of services and technologies providing the necessary functionality to handle high volume and high variety of data, at all types of velocities, while ensuring high scalability, availability and performance. Traditional data pipelines use the ETLContinue reading “3. Modern data architecture in Azure – Case study: WTA Insights”

2. WTA Insights – project overview

You like data, and you like Azure and you want to learn more about data engineering with Azure. As a data engineer you will work in different domains across your career and the way you process data and look into data will vary as business goals vary per industry. Finance, insurance, healthcare, entertainment, retail, sports,Continue reading “2. WTA Insights – project overview”

1. Getting started with data engineering

We are living in the era of “big data”, where data is the new gold. With tons of data pilling up there’s an ever growing need to harvest this data and transform it into actionable insights to boost businesses. At the core of the modern data team are data engineers, data scientists, AI engineers andContinue reading “1. Getting started with data engineering”

Power BI Series

In this Power BI series we embark on the journey of telling the WTA story with data. 🎾🏆 We will learn how to load, combine and transform the data in Power Query, how to build a data model and write DAX measures. We will bring all these elements together in Power BI visuals and reportsContinue reading “Power BI Series”

Databricks Series

Azure Databricks is a cloud-based big data and machine learning platform based on Apache Spark. With fully managed Spark clusters, it can process large data workloads and supports APIs for R, SQL, Python, Scala and Java. Apache Spark is an open source, distributed computing environment, that can analyze big data using SQL (Spark SQL), machineContinue reading “Databricks Series”

ADF Series

Azure Data Factory (ADF) is Azure’s cloud-managed service for ETL and ELT processes. It is similar to SSIS, but in the cloud. ADF can connect to various data sources, on-premises or in the cloud. If the data you want to access is on-premises, you will need to configure a data management gateway to connect toContinue reading “ADF Series”

Project Kickoff

An end-to-end Azure Data Engineering project that will walk you through various technologies and services in the Azure landscape. 🌊🐳🌊 Posts in the kickoff series: 1. Getting started with data ingineering 2. WTA Insights – project overview 3. Modern data architecture in Azure – Case study: WTA Insights 4. Setting things up – Azure Portal 5.Continue reading “Project Kickoff”