Data Engineering Course

Learn how to build big data systems from scratch using Python and cloud infrastructure.

Are you interested in data engineering?

You landed at the right place. 

This course will teach you how to build distributed data processing pipelines using open-source frameworks. 

For every component of a typical big data system, we learn the underlying principles, then apply those concepts using Python and SQL-like frameworks to construct pipelines from scratch. 

The course covers:
  • Evolution of Data Processing
  • Types of Big Data Systems
  • Using Docker and Google Cloud Platform
  • Distributed Cache with Redis
  • Parallel Processing with Hadoop
  • Batch Processing with Spark
  • Batch Storage with HBase
  • Data Collection with Flume
  • Messaging with Kafka
  • Stream Processing with Spark & Kafka
  • Real-time Storage with Cassandra
  • Distributed Search & Indexing with the ELK stack
  • Presentation with Grafana & Prometheus

Throughout the Data Engineering course, you'll learn everything you need to build robust data pipelines.

Watch every data engineering concept explained

Over 20 hours of video lessons, filled with concepts and live Python examples covering all major areas of data engineering.

Apply every lesson with challenging exercises

Each unit has a set of exercises for you to test your understanding and build your engineering skills.

Get all your questions answered

Get feedback for issues, further explanation on specific topics and advice on your capstone project.

Level up with mentorship

Take your learning experience to the next level with
1-on-1 mentorship (launching soon).

What's included?

Video Icon 59 videos Text Icon 17 text files


Unit 1 - Introductory Topics
Welcome to the Course
7 mins
Technical Review
6 mins
Lab - Docker & GCP
13 mins
Assignment - Environment Setup
Solution - Environment Setup
9 mins
Evolution of Data Processing
7 mins
Big Data Systems
11 mins
Small Data
5 mins
Introduce Yourself
Unit 2 - Scaling & Processing with Hadoop
20 mins
Shared State & Data
7 mins
Parallel Processing
7 mins
Lab - Threads & Processes
12 mins
Assignment - Shared State Management
Solution - Shared State Management
12 mins
Distributed Cache
9 mins
Lab - Redis
4 mins
Assignment - Shared State with Redis
Solution - Shared State with Redis
9 mins
9 mins
Hadoop Ecosystem
10 mins
Lab - HDFS & Hadoop
8 mins
Assignment - HDFS
Solution - HDFS
7 mins
Master Datasets
32 mins
Lab - Avro in Python
6 mins
Assignment - Hadoop MR and Master Datasets
Solution - Hadoop MR and Master Datasets
13 mins
Unit 3 - Batch Processing
Batch Processing
28 mins
Batch Views
15 mins
Lab - Spark
11 mins
Assignment - Spark Batch Processing and Parquet
Solution - Spark Batch Processing and Parquet
15 mins
Unit 4 - Data Storage
RDBMS Review
8 mins
NoSQL Classification
19 mins
NoSQL Techniques & Requirements
24 mins
Batch Views Storage
13 mins
Lab - HBase
11 mins
Assignment - Batch Views with HBase
Solution - Batch Views with HBase
7 mins
Capstone Project Brainstorming
Interview with a Data Engineer
Unit 5 - Collection & Data Ingestion
Collection Tier
18 mins
Apache Flume
15 mins
Lab - Flume
17 mins
Assignment - Flume
Solution - Flume
9 mins
Unit 6 - Messaging
Messaging Tier
15 mins
Apache Kafka
15 mins
Lab - Apache Kafka
16 mins
Assignment - Messaging Tier with Kafka
Solution - Messaging Tier with Kafka
9 mins
Unit 7 - Stream Processing
Stream Processing
23 mins
Spark Streaming
16 mins
Lab - Spark Streaming
6 mins
Assignment - Stream Processing with Spark
Solution - Stream Processing with Spark
8 mins
21 mins
Lab - Cassandra
7 mins
Assignment - Data Modeling & Processing with Cassandra
Solution - Data Modeling & Processing with Cassandra
8 mins
Unit 8 - Distributed Indexing
Distributed Indexing
12 mins
12 mins
Lab - ElasticSearch & Kibana
11 mins
Assignment - Distributed Search & Indexing
Solution - Distributed Search & Indexing
9 mins
9 mins
Unit 9 - Presentation
Data Access & Visualization
13 mins
Monitoring Pipelines
10 mins
Lab - Grafana and Prometheus
6 mins
What's Next?
Airbnb Case Study
13 mins
9 mins
Big Data Services
4 mins
16 mins
Capstone Project Submission
Career Transition


I'm a total beginner, is this course for me?

You should have a strong command of Python, SQL, and the Command Line. You can learn these subjects for free via Codecademy.

Who is this course for?

This course is geared for individuals with experience in software development, cloud computing or database administration.

You will gain a strong understanding of how to build big data systems and be ready to deploy projects in production.

Are there any technical requirements?

All code examples are shown using a Mac computer, however, Windows and Linux are fine. Docker and cloud infrastructure are universally compatible.

I highly recommend having a dual-screen setup so you don't have to switch windows often.

How long do I have access?

Forever. Seriously. After you sign up you will be able to access the course lessons whenever you like and across any device you own.