1. Getting started with data engineering

We are living in the era of “big data”, where data is the new gold. With tons of data pilling up there’s an ever growing need to harvest this data and transform it into actionable insights to boost businesses.

At the core of the modern data team are data engineers, data scientists, AI engineers and data analysts. Each role solves a different problem and they all work hand in hand to deliver valuable insights that help businesses achieve their vision.

Data engineering is all about collecting all types of data (structured, unstructured, semi-structured), at various latencies (batch and streaming), transforming, storing and validating it for analysis. In other words, building and maintaining data pipelines to ingest, process, store and deliver data in the cloud, on premise or hybrid environments.

Data engineers are the backbone of the modern data team. They prepare trusted data that data scientists and data analysts use in their work.

The traditional approach to designing data pipelines is known as ETL (Extract-Transform-Load). The rise of big data and the need to handle all types of data at an unlimited scale has shifted this paradigm from ETL to ELT (Extract-Load-Transform).

In the ELT approach, the main focus is on loading all data (of any type) into a single, centralized scalable storage (such as Azure Data Lake Store) and store it for later consumption.

Jack of all trades, master of none … 😊 Or master of all? 🤔
Data engineers require a broad set of skills (not limited to the list below):

  • strong understanding of data collection, ingestion and transformation, data storage, data modelling, data governance
  • BI architecture and tools
  • data warehousing / big data skills
  • ETL processes and tools
  • data visualization – Power BI, Tableau
  • proficiency in at least one programming language – Python, R, C#
  • various database platforms (SQL and noSQL)

Microsoft Azure data platform provides a wide range of technologies and services that enables data engineers to build modern large-scale data analytics systems. In fact, Azure is one of the main actors in the cloud industry, and data platform specifically, that many data engineers specialize in it and choose it as a career.

What’s next
Let’s put our best foot forward! Join me in the next post where I introduce the WTA Insights project.🐳

Want to read more?
Microsoft learning resources and documentation:
Azure for the data engineer
Classify your data
ETL and ELT