Data Engineering using Databricks features on AWS and Azure
Build Data Engineering Pipelines using Databricks core features such as Spark, Delta Lake, cloudFiles, etc.
What you’ll learn
- Data Engineering leveraging Databricks features
- Databricks CLI to manage files, Data Engineering jobs and clusters for Data Engineering Pipelines
- Deploying Data Engineering applications developed using PySpark on job clusters
- Deploying Data Engineering applications developed using PySpark using Notebooks on job clusters
- Programming experience using Python
- Data Engineering experience using Spark
- Ability to write and interpret SQL Queries
- This course is ideal for experience data engineers to add Databricks as one of the key skill as part of the profile
As part of this course, you will learn all the Data Engineering using cloud platform-agnostic technology called Databricks.
About Data Engineering
All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc.
Databricks is the most popular cloud platform-agnostic data engineering tech stack. They are the committers of the Apache Spark project. Databricks run time provide Spark leveraging the elasticity of the cloud. With Databricks, you pay for what you use. Here are some of the core features of Databricks.
- Spark – Distributed Computing
- Delta Lake – Perform CRUD Operations.
- cloudFiles – Get the files in an incremental fashion in the most efficient way leveraging cloud features.
As part of this course, you will be learning Data Engineering using Databricks.
- Getting Started with Databricks
- Setup Local Development Environment to develop Data Engineering Applications using Databricks
- Using Databricks CLI to manage files, jobs, clusters, etc related to Data Engineering Applications
- Spark Application Development Cycle to build Data Engineering Applications
- Databricks Jobs and Clusters
- Deploy and Run Data Engineering Jobs on Databricks Job Clusters as Python Application
- Deploy and Run Data Engineering Jobs on Job Cluster using Notebooks
- Deep Dive into Delta Lake using Dataframes
- Deep Dive into Delta Lake using Spark SQL
- Building Data Engineering Pipelines using Spark Structured Streaming on Databricks Clusters
- Incremental File Processing using Spark Structured Streaming leveraging Databricks Auto Loader cloudFiles
- Overview of Auto Loader cloudFiles File Discovery Modes – Directory Listing and File Notifications
- Differences between Auto Loader cloudFiles File Discovery Modes – Directory Listing and File Notifications
- Differences between traditional Spark Structured Streaming and leveraging Databricks Auto Loader cloudFiles for incremental file processing.
We will be adding few more modules related to Pyspark, Spark with Scala, Spark SQL, Streaming Pipelines in the coming weeks.
Who this course is for:
- Beginner or Intermediate Data Engineers who want to learn Databricks for Data Engineering
- Intermediate Application Engineers who want to explore Data Engineering using Databricks
- Data and Analytics Engineers who want to learn Data Engineering using Databricks