WTA Insights Project, Azure Databricks

3. Notebooks with Python – part 2

In this post, we will create a third notebook to prep and transform the wta_matches csv’s. Matches Notebook Launch the Databricks portal and create a new cluster, as shown in the previous post. Name the cluster matchesNotebook. Recall from the previous posts on ADF, that we have ingested the wta_matches files (53 in total) inContinue reading “3. Notebooks with Python – part 2”

2. Notebooks with Python – part 1

Now that we have provisioned a Databricks workspace and have created a Spark cluster, it is time to get spinning by writing our first notebook. A notebook is a collection of cells. These cells are run to execute code, to render formatted text, or to display graphical visualizations. A Databricks notebook cell can execute Python,Continue reading “2. Notebooks with Python – part 1”

1. Creating a Databricks service, workspace and cluster

The next step in our WTA Insights data journey is to cleanse and transform the tennis_wta files that we have ingested in our data lake. The plan is to use Databricks to prep the csv files and then store them back on the data lake, in the cleansed layer, ready for Power BI to consumeContinue reading “1. Creating a Databricks service, workspace and cluster”

Databricks Series

Azure Databricks is a cloud-based big data and machine learning platform based on Apache Spark. With fully managed Spark clusters, it can process large data workloads and supports APIs for R, SQL, Python, Scala and Java. Apache Spark is an open source, distributed computing environment, that can analyze big data using SQL (Spark SQL), machineContinue reading “Databricks Series”