Table of contents
  1. Mount ADLS Gen2 on Databricks Using AAD OAuth & Service Principal
    1. For the busy people
    2. Detailed steps
  2. Conclusion

Mount ADLS Gen2 on Databricks Using AAD OAuth & Service Principal

Integrate Databricks with Azure Data Lake Storage Gen2 (ADLS Gen2) securely using Azure Active Directory (AAD) OAuth and a service principal.

For the busy people

Execute this in a databricks notebook:

dbutils.fs.mount(
  source = "adl://.azuredatalakestore.net/",
  mount_point = "/mnt/",
  extra_configs = {
    "dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
    "dfs.adls.oauth2.client.id": dbutils.secrets.get(scope = "", key = "client-id"),
    "dfs.adls.oauth2.credential": dbutils.secrets.get(scope = "", key = "client-secret"),
    "dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com//oauth2/token"}
)

Detailed steps

  1. Azure Setup:
    • Create Service Principal: In the Azure Portal, navigate to Azure Active Directory > App registrations > New registration. Provide a name and register the application. Save the Application (client) ID and create a new client secret under Certificates & secrets. Save the client secret value.
    • Assign Role: In your storage account, assign the Storage Blob Data Contributor role to the service principal.
  2. Store Credentials in Key Vault: In Azure Key Vault, add the client ID, client secret, and tenant ID as secrets.

  3. Databricks Configuration: In your Databricks notebook, follow these steps to configure and mount ADLS Gen2:

    • Fetch Credentials:

      clientID = dbutils.secrets.get(scope="azbackedscope", key="regappClientID")
      clientSecret = dbutils.secrets.get(scope="azbackedscope", key="regappClientSecret")
      directoryID = dbutils.secrets.get(scope="azbackedscope", key="regappDirectoryID")
      
    • Set OAuth Configs:

      configs = {
          "fs.azure.account.auth.type": "OAuth",
          "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
          "fs.azure.account.oauth2.client.id": clientID,
          "fs.azure.account.oauth2.client.secret": clientSecret,
          "fs.azure.account.oauth2.client.endpoint": f"https://login.microsoftonline.com/{directoryID}/oauth2/token"
      }
      
    • Mount Storage:

      storageAccountName = "your_storage_account_name"
      containerName = "your_container_name"
      mountPoint = "/mnt/your_mount_name"
           
      adlsPath = f"abfss://{containerName}@{storageAccountName}.dfs.core.windows.net/"
           
      dbutils.fs.mount(
          source=adlsPath,
          mount_point=mountPoint,
          extra_configs=configs
      )
      
    • Verify Mount:

      display(dbutils.fs.ls(mountPoint))
      

Conclusion

With ADLS Gen2 storage mounted on DBFS, you can read and write data more conveniently. The code becomes simpler as you don’t have to authenticate every time, making it feel like accessing local storage.