E2E Project
Concepts we will learn¶
Here you will understand two concepts of ETL:
Medallion architecture for data management: Star Schema:
Project summary¶
Here we will first import data(few .csv files) into bronze layer(Simple folders). Then clean and transform it and load it into Silver Delta Table. Then create star schema and load the data from silver to Gold Delta Tables.
You have successfully taken data from your bronze layer, transformed it, and loaded it into a silver Delta table. Now you’ll use a new notebook to transform the data further, model it into a star schema, and load it into gold Delta tables.
In data processing, you often hear about Bronze, Silver, and Gold layers, which are part of the Medallion Architecture.
Here I will show the use of Medallion architecture in a simple straightforward ETL project:
- Import data files into the Bronze layer (Lakehouse folder).
- Define a schema and create a dataframe from the .csv files.
- Clean the data and add new columns as needed.
- Manually create a Silver Delta Table with the same schema as the dataframe.
- Perform an upsert operation on the Delta table, which means updating existing records based on certain conditions and adding new records if no match is found.
- Explore data in the silver layer using the SQL endpoint