Setup Project
Background¶
Here, I will show you how to setup a simple dbt project setup from scratch. You will just need a windows laptop for this as I have done it on my Windows machine.
Google console pre-requisite setup¶
Step 1: Create a Google Cloud Project¶
- Go to the Google Cloud Console: Google Cloud Console.
- Create a New Project:
- In the top left corner, click on the Project dropdown and select New Project.
- Enter a Project Name (e.g.,
my-dbt-project
). -
Note down the Project ID; you’ll need this later. This is typically something like
my-dbt-project-123456
. -
Set the Project as Active:
- Click on the Project dropdown again and select your newly created project to make it active.
Step 2: Enable BigQuery API¶
- Enable the BigQuery API:
- In the Google Cloud Console, navigate to APIs & Services > Library.
- Search for "BigQuery API" and click Enable.
Step 3: Create a Service Account¶
- Create a Service Account:
- Go to IAM & Admin > Service Accounts in the Google Cloud Console.
- Click + CREATE SERVICE ACCOUNT.
- Enter a Service Account Name (e.g.,
dbt-service-account
). -
Click Create and Continue.
-
Grant the Service Account Permissions:
- Under Role, select BigQuery Admin. This gives the service account full access to BigQuery.
-
Click Continue and then Done.
-
Create and Download the JSON Key:
- In the Service Accounts list, find your new service account and click on it.
- Go to the Keys tab, click Add Key > Create New Key.
- Select JSON and click Create.
- The JSON key file will be downloaded to your computer. This is the key file you will use in dbt.
Step 4: Set Up Google Cloud SDK (Optional, but Recommended)¶
- Install Google Cloud SDK:
- If you don’t already have it installed, you can download it from Google Cloud SDK.
-
Follow the instructions to install it on your machine.
-
Authenticate with the SDK:
- Open a terminal and run:
-
This will open a browser window where you can log in with your Google account.
-
Set the Active Project:
- Run the following command to set your project as the active project:
- Replace
your_project_id
with the actual Project ID you noted earlier.
Reference: https://www.youtube.com/watch?v=DzxtCxi4YaA https://robust-dinosaur-2ef.notion.site/PUBLIC-Retail-Project-af398809b643495e851042fa293ffe5b
Step 5: Create a BigQuery Dataset¶
- Navigate to BigQuery in the Google Cloud Console.
- Create a Dataset:
- Click on your project name to expand it, then click Create Dataset.
- Enter a Dataset ID (e.g.,
my_dataset
). - Configure location and other settings as needed, then click Create Dataset.
Step 6: Note Down the Information¶
Make sure you have the following information handy: - Project ID: Your Google Cloud Project ID (e.g., my-dbt-project-123456
). - Service Account JSON Key File: The path to the JSON file you downloaded (e.g., /path/to/your-service-account-file.json
). - Dataset ID: The ID of the dataset you created (e.g., my_dataset
).
Step 7: Start Creating Your dbt Project¶
Now you’re ready to create your dbt project using the above information. When dbt asks for: - BigQuery: Choose BigQuery as your data warehouse. - Key File Path: Provide the path to the JSON key file. - Project ID: Enter the Google Cloud Project ID. - Dataset: Enter the Dataset ID you created in BigQuery.
Step 1: Install dbt and BigQuery Adapter¶
- Just create a folder, say, dbt-bigquery, CD to it and run
python -m venv dbt-venv
. You will see a folder createddbt-venv
- Activate the dbt-venv by running
dbt-venv\Scripts\activate
- Run the following commands to install dbt and the BigQuery adapter: This will install dbt and the necessary adapter to connect to BigQuery.
Step 3: Initialize a dbt Project¶
-
In your terminal inside Visual Studio Code, run:
Replaceyour_project_name
with the name you want for your dbt project. -
Navigate to the newly created project directory:
Step 4: Configure Your dbt Project for BigQuery¶
- Open the
profiles.yml
file: - If you don't have one, you can create it in
~/.dbt/
. - The file should look like this:
-
Replace
your_project_name
,your-gcp-project-id
,your_dataset_name
, and/path/to/your-service-account-file.json
with your actual project details. -
Save the file.
Step 5: Create Models and Start Developing¶
- Open the
models
folder in your dbt project. - Create a new
.sql
file for each model you want to define. For example: Replaceyour_gcp_project
,your_dataset
, andyour_table
with the actual names.
Step 6: Run Your dbt Models¶
- In the terminal, inside your dbt project directory, run: This will execute your models and create the tables or views in BigQuery.
Step 7: Check Your Work¶
- Check BigQuery to see if your tables or views have been created as expected.
Step 8: Version Control (Optional)¶
- Initialize a Git repository in your project directory:
- Add and commit your files:
Final Tips¶
- Always test your models by running
dbt run
regularly to ensure everything is working. - Use dbt docs to generate documentation with
dbt docs generate
.
That's it! You've set up a dbt project for BigQuery using Visual Studio Code. Happy developing!