Skip to content

E2E Project

Concepts we will learn

Here you will understand two concepts of ETL:

Medallion architecture for data management: Star Schema:

Project summary

Here we will first import data(few .csv files) into bronze layer(Simple folders). Then clean and transform it and load it into Silver Delta Table. Then create star schema and load the data from silver to Gold Delta Tables.

You have successfully taken data from your bronze layer, transformed it, and loaded it into a silver Delta table. Now you’ll use a new notebook to transform the data further, model it into a star schema, and load it into gold Delta tables.

In data processing, you often hear about Bronze, Silver, and Gold layers, which are part of the Medallion Architecture.

Medallion Architecture Diagram

Here I will show the use of Medallion architecture in a simple straightforward ETL project:

  1. Import data files into the Bronze layer (Lakehouse folder).
  2. Define a schema and create a dataframe from the .csv files.
  3. Clean the data and add new columns as needed.
  4. Manually create a Silver Delta Table with the same schema as the dataframe.
  5. Perform an upsert operation on the Delta table, which means updating existing records based on certain conditions and adding new records if no match is found.
  6. Explore data in the silver layer using the SQL endpoint