Table of contents
  1. A Simple Synapse Pipeline. Copy files from Laptop To ADLS
    1. For the busy people
    2. Create and Configure an Integration Runtime
    3. Create Two Linked Services (Connection Strings)
      1. Linked Service to Laptop’s Folder
        1. A Common Error
      2. Linked Service to ADLS
    4. Create a Pipeline with Copy Data Activity
      1. Crate a New Pipeline in Syanpse Workspace
      2. Add Copy Data Activity
      3. Configure the Source Dataset etc
      4. Configure the Sink Dataset
      5. Execute the Pipeline
    5. Appendix
      1. Manually Installating Integration Runtime

alt text

A Simple Synapse Pipeline. Copy files from Laptop To ADLS

We have a Synapse workspace and some CSV files on our laptop that we want to upload to ADLS. Here is the Microsoft recommended way to do it:

  1. Install SHIR on the laptop.
  2. Create a pipeline with a copy data activity.
  3. Run the pipeline.

    In Power Platform, the SHIR is replaced by the on-premise gateway. Both are software installed on your local machine, but one is for Synapse and the other is for Power Platform, Fabric, etc.

Let’s get our hands dirty and see how to do it.

For the busy people

An outline of all the activities typically involved in this process is provided in the table below. For more details, please continue reading the following sections.

Step Action Definition
1: Open Synapse Workspace Synapse Workspace Every synapse instance has a workspace. This is the central place for all synapse activities.
2: Install SHIR on Laptop Self-hosted Integration Runtime (SHIR) A software installed on your laptop to enable data movement from on-premises to the cloud. More details on Integration Pipelines
3: Create Linked Services Linked Services These are connection strings to your data sources (local file system) and destinations (ADLS).
4: Define Datasets Datasets These are like file type. In this case the source and destination datasets are .csvs
5: Build the Pipeline Pipeline A pipline is a workflow, it contains many tasks like Copy data etc.
6: Add a Copy Data Activity Copy Data Activity This is a very important activity that takes care of the entire data copying.
7: Set Up Triggers Triggers This will tell what will trigger the pipeline

Create and Configure an Integration Runtime

The very first step is to install the integration runtime on the local machine. The IR is the backbone of the connection between the local folder and Synapse.

  • In your synapse workspace go to Manage -> Integration -> Integration runtime.
  • Click on New, then in the settings, you will have two install options. Choose an express setup. alt text

    Express setup is a quicker option as it both installs and links the local IR environment with the synapse setup. If you prefer to do a manual setup, refer to to appenxis.

Create Two Linked Services (Connection Strings)

Next, we need to create two connection strings (also known as Linked Services): one to the local laptop’s folder (source) and another to the ADLS (destination).

Linked Service to Laptop’s Folder

  1. In Synapse workspace, go to Manage -> Linked Services -> New.
  2. Select File System and provide a name for the linked service. alt text
  3. Select the Integration Runtime we created earlier.
  4. Specify the path to the CSV files on your laptop and provide a user name and password which has read/write access to the folder. alt text

    Here, sa is a local user which has read/write access to the folder.

    alt text

Go to the properties of the source folder and navigate to the security tab to check if the user has the appropriate permissions for the folder.

alt text

A Common Error

After setting up the linked service when you Test connection it may fail. alt text

This has nothing to do with the setup but a windows security feature which causes the issue. To resolve this, open Command Prompt as Administrator and run the following commands:

   cd C:\Program Files\Microsoft Integration Runtime\5.0\Shared
   .\dmgcmd.exe -DisableLocalFolderPathValidation

This will disable local folder path validation, and Test connection will pass this time. alt text

Linked Service to ADLS

  1. Navigate to Manage -> Linked Services -> New.
  2. Select Azure Data Lake Storage Gen2. alt text

  3. In Our case we will use AutoResolveIntegrationRuntime. Sometimes its a good choice. alt text

Create a Pipeline with Copy Data Activity

Now that the linked services are configured, create a pipeline to copy data:

Crate a New Pipeline in Syanpse Workspace

In Synapse workspace, go to Integrate -> Pipelines -> New Pipeline. alt text

Add Copy Data Activity

Drag and drop the Copy Data activity onto the pipeline canvas. alt text

Configure the Source Dataset etc

  1. Choose dataset: Go to the Source tab, then Files, select File System, and click Continue. alt text

  2. Choose File Format: Now, you have to select the format of the source files. We have CSV, so we will select Delimited Text. alt text

  3. Select Linked Service: Next, select the Linked Service which we created earlier. This is the connection string that connects to the Laptops folder. You will see the File path and other details appear. Choose First row as header, which is usually the case for all CSVs. alt text

  4. Preview data: If successful, you can preview the data. It will load one of the files to show you how the data looks, displaying a well-formatted table. Note, how we have seleccted *.csv to load all the csv files in the folder. alt text

Configure the Sink Dataset

  1. Select Integration Dataset: Go to the Sink tab, then select Azure Data Lake Storage Gen2. alt text

  2. Selct File Format: Now, we need to provide the format in which the data will be copied to the destination. For this, select DelimitedText. alt text

  3. Select Linked Service & IR: Next, select the linked service which has the connection information to the container in ADLS where your data will be stored. You can choose any integration runtime. Here, I have chosen the default AutoResolveIntegrationRuntime as it is the simplest and comes factory-shipped with the Synapse workspace. alt text

  4. Choose other properties: Once the sink dataset is configured, you can choose other properties like Copy behavior, etc. alt text

Execute the Pipeline

  1. Validate the Pipeline: Ensure all configurations are correct and validate the pipeline.
  2. Run the Pipeline: Execute the pipeline to start the data transfer from your laptop to ADLS. If it runs successfully you will see the data copied to your desired ADLS container. alt text

Appendix

Manually Installating Integration Runtime

The integration runtime can also be downloaded and installed separately from the Microsoft software store. Integration runtime download screen

Install it on your local machine. The steps are straightforward. Just click through the installation process.

Integration runtime installation steps

In the final step, you will need to register the Runtime by copying and pasting the authentication key from the Synapse portal. Integration runtime registration