Table of contents
Mount ADLS Gen2 on Databricks Using AAD OAuth & Service Principal
Integrate Databricks with Azure Data Lake Storage Gen2 (ADLS Gen2) securely using Azure Active Directory (AAD) OAuth and a service principal.
For the busy people
Execute this in a databricks notebook:
dbutils.fs.mount(
source = "adl://.azuredatalakestore.net/",
mount_point = "/mnt/",
extra_configs = {
"dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
"dfs.adls.oauth2.client.id": dbutils.secrets.get(scope = "", key = "client-id"),
"dfs.adls.oauth2.credential": dbutils.secrets.get(scope = "", key = "client-secret"),
"dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com//oauth2/token"}
)
Detailed steps
- Azure Setup:
- Create Service Principal: In the Azure Portal, navigate to Azure Active Directory > App registrations > New registration. Provide a name and register the application. Save the Application (client) ID and create a new client secret under Certificates & secrets. Save the client secret value.
- Assign Role: In your storage account, assign the
Storage Blob Data Contributor
role to the service principal.
-
Store Credentials in Key Vault: In Azure Key Vault, add the client ID, client secret, and tenant ID as secrets.
-
Databricks Configuration: In your Databricks notebook, follow these steps to configure and mount ADLS Gen2:
-
Fetch Credentials:
clientID = dbutils.secrets.get(scope="azbackedscope", key="regappClientID") clientSecret = dbutils.secrets.get(scope="azbackedscope", key="regappClientSecret") directoryID = dbutils.secrets.get(scope="azbackedscope", key="regappDirectoryID")
-
Set OAuth Configs:
configs = { "fs.azure.account.auth.type": "OAuth", "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider", "fs.azure.account.oauth2.client.id": clientID, "fs.azure.account.oauth2.client.secret": clientSecret, "fs.azure.account.oauth2.client.endpoint": f"https://login.microsoftonline.com/{directoryID}/oauth2/token" }
-
Mount Storage:
storageAccountName = "your_storage_account_name" containerName = "your_container_name" mountPoint = "/mnt/your_mount_name" adlsPath = f"abfss://{containerName}@{storageAccountName}.dfs.core.windows.net/" dbutils.fs.mount( source=adlsPath, mount_point=mountPoint, extra_configs=configs )
-
Verify Mount:
display(dbutils.fs.ls(mountPoint))
-
Conclusion
With ADLS Gen2 storage mounted on DBFS, you can read and write data more conveniently. The code becomes simpler as you don’t have to authenticate every time, making it feel like accessing local storage.