Data Factory in Microsoft Fabric

Data Factory in Microsoft Fabric

Background

ADF is a very important commponent of Fabric. It is 100% the same old ADF in the new Fabric Platform.

Pipelines in Microsoft Fabric

Understand Pipelines

A Pipeline is like a workflow for ingesting and transforming data.
Using the GUI we can build complex pipelines with very less coding.

Core Pipeline Concepts

Activities: Executable tasks in a sequence. Two types:

Data Transformation: Transfers and transforms data (e.g., Copy Data, Data Flow, Notebook, Stored Procedure).
Control Flow: Implements loops, conditional branching, and manages variables.

Parameters

Enable specific values for each run, increasing reusability.

Pipeline Runs

Executed on-demand or scheduled. Unique run ID for tracking and reviewing each execution.

Canvas for desinign piplines

Fabric offers a Canvas where you can build complex pipeliens without much coding:

$\alt text$

The Copy Data Activity

The Copy Data is the most important activity in data pipelines. Some pipelines only contain one Copy Data activity, thats all!

When to use?

Use the Copy Data activity to move data without transformations or to import raw data. For transformations or merging data, use a Data Flow (Gen2) activity with Power Query to define and include multiple transformation steps in a pipeline.

The Copy Data Tool

$\alt text$

Pipeline Templates

To create a pipeline based on a template on the start page choose Templates

$\alt text$

You will see templates like this:

$\alt text$

Run and monitor pipelines

You can run a pipeline, schedule it and view the run history from the GUI

$\alt text$

Dataflows

A way to import and transform data with Power Query Online.

When you choose DataFlows

You need to connect to and transform data to be loaded into a Fabric lakehouse. You aren’t comfortable using Spark notebooks, so decide to use Dataflows Gen2. How would you complete this task?

Answer: Create a Dataflow Gen2 to transform data > add your lakehouse as the data destination.

You can either use Dataflows by iteself or add dataflows in pipelines.

Pipeline Copy Vs DataFlows Vs Spark

Property	Pipeline Copy Activity	Dataflow Gen 2	Spark
Use Case	Data lake and data warehouse migration, data ingestion, lightweight transformation	Data ingestion, data transformation, data wrangling, data profiling	Data ingestion, data transformation, data processing, data profiling
Code Written	No code, low code	No code, low code	Code
Data Volume	Low to high	Low to high	Low to high
Development Interface	Wizard, canvas	Power Query	Notebook, Spark job definition
Sources	30+ connectors	150+ connectors	Hundreds of Spark libraries
Destinations	18+ connectors (Lakehouse, Azure SQL database, Azure Data explorer, Azure Synapse analytics)	Hundreds of Spark libraries
Transformation Complexity	Low: lightweight (type conversion, column mapping, merge/split files, flatten hierarchy)	Low to high: 300+ transformation functions	Low to high: support for native Spark and open-source libraries