Acumen IT Training, Inc.

BIG DATA ETL DATA PIPELINE TRAINING USING DATABRICKS

COURSE DESCRIPTION

  • Big Data Fundamentals and Apache Spark programing
  • ETL Part 1: Data Extraction (with capstone)
  • ETL Part 2: Data Transformation and Loads (with capstone)
  • Data Pipeline – Structured Streaming (with capstone)
  • Databricks ETL Part 3: Production
  • Databricks Delta with Capstone Project

COURSE OUTLINE

  • Big Data Fundamentals
  • Data Lake
  • Data Warehouse vs DataLake
  • RDD
  • Write a basic ETL pipeline using the Spark design pattern Ingest data using DBFS mounts in Azure Blob Storage/s3.
  • Define and apply a user-defined schema to semi-structured JSON data
  • Apply built-in functions to manipulate data
  • Write UDFs with a single DataFrame column inputs
  • Use the interactive Databricks notebook environment
  • Ingest streaming log file data
  • Databricks Delta Architecture
  • Create Table
  • Append Table
Please contact us for the full course outline, schedules and for booking a private class.
;