Skip to content

Setup Project

Background

Here, I will show you how to setup a simple dbt project setup from scratch. You will just need a windows laptop for this as I have done it on my Windows machine.

Google console pre-requisite setup

Step 1: Create a Google Cloud Project

  1. Go to the Google Cloud Console: Google Cloud Console.
  2. Create a New Project:
  3. In the top left corner, click on the Project dropdown and select New Project.
  4. Enter a Project Name (e.g., my-dbt-project).
  5. Note down the Project ID; you’ll need this later. This is typically something like my-dbt-project-123456.

  6. Set the Project as Active:

  7. Click on the Project dropdown again and select your newly created project to make it active.

Step 2: Enable BigQuery API

  1. Enable the BigQuery API:
  2. In the Google Cloud Console, navigate to APIs & Services > Library.
  3. Search for "BigQuery API" and click Enable.

Step 3: Create a Service Account

  1. Create a Service Account:
  2. Go to IAM & Admin > Service Accounts in the Google Cloud Console.
  3. Click + CREATE SERVICE ACCOUNT.
  4. Enter a Service Account Name (e.g., dbt-service-account).
  5. Click Create and Continue.

  6. Grant the Service Account Permissions:

  7. Under Role, select BigQuery Admin. This gives the service account full access to BigQuery.
  8. Click Continue and then Done.

  9. Create and Download the JSON Key:

  10. In the Service Accounts list, find your new service account and click on it.
  11. Go to the Keys tab, click Add Key > Create New Key.
  12. Select JSON and click Create.
  13. The JSON key file will be downloaded to your computer. This is the key file you will use in dbt.
  1. Install Google Cloud SDK:
  2. If you don’t already have it installed, you can download it from Google Cloud SDK.
  3. Follow the instructions to install it on your machine.

  4. Authenticate with the SDK:

  5. Open a terminal and run:
    gcloud auth login
    
  6. This will open a browser window where you can log in with your Google account.

  7. Set the Active Project:

  8. Run the following command to set your project as the active project:
    gcloud config set project your_project_id
    
  9. Replace your_project_id with the actual Project ID you noted earlier.

Reference: https://www.youtube.com/watch?v=DzxtCxi4YaA https://robust-dinosaur-2ef.notion.site/PUBLIC-Retail-Project-af398809b643495e851042fa293ffe5b

Step 5: Create a BigQuery Dataset

  1. Navigate to BigQuery in the Google Cloud Console.
  2. Create a Dataset:
  3. Click on your project name to expand it, then click Create Dataset.
  4. Enter a Dataset ID (e.g., my_dataset).
  5. Configure location and other settings as needed, then click Create Dataset.

Step 6: Note Down the Information

Make sure you have the following information handy: - Project ID: Your Google Cloud Project ID (e.g., my-dbt-project-123456). - Service Account JSON Key File: The path to the JSON file you downloaded (e.g., /path/to/your-service-account-file.json). - Dataset ID: The ID of the dataset you created (e.g., my_dataset).

Step 7: Start Creating Your dbt Project

Now you’re ready to create your dbt project using the above information. When dbt asks for: - BigQuery: Choose BigQuery as your data warehouse. - Key File Path: Provide the path to the JSON key file. - Project ID: Enter the Google Cloud Project ID. - Dataset: Enter the Dataset ID you created in BigQuery.

Step 1: Install dbt and BigQuery Adapter

  1. Just create a folder, say, dbt-bigquery, CD to it and run python -m venv dbt-venv. You will see a folder created dbt-venv
  2. Activate the dbt-venv by running dbt-venv\Scripts\activate
  3. Run the following commands to install dbt and the BigQuery adapter:
    pip install dbt-core dbt-bigquery
    
    This will install dbt and the necessary adapter to connect to BigQuery.

Step 3: Initialize a dbt Project

  1. In your terminal inside Visual Studio Code, run:

    dbt init your_project_name
    
    Replace your_project_name with the name you want for your dbt project.

  2. Navigate to the newly created project directory:

    cd your_project_name
    

Step 4: Configure Your dbt Project for BigQuery

  1. Open the profiles.yml file:
  2. If you don't have one, you can create it in ~/.dbt/.
  3. The file should look like this:
    your_project_name:
      target: dev
      outputs:
        dev:
          type: bigquery
          method: service-account
          project: your-gcp-project-id
          dataset: your_dataset_name
          threads: 1
          keyfile: /path/to/your-service-account-file.json
          timeout_seconds: 300
          location: your_location
    
  4. Replace your_project_name, your-gcp-project-id, your_dataset_name, and /path/to/your-service-account-file.json with your actual project details.

  5. Save the file.

Step 5: Create Models and Start Developing

  1. Open the models folder in your dbt project.
  2. Create a new .sql file for each model you want to define. For example:
    -- my_first_model.sql
    SELECT *
    FROM `your_gcp_project.your_dataset.your_table`
    
    Replace your_gcp_project, your_dataset, and your_table with the actual names.

Step 6: Run Your dbt Models

  1. In the terminal, inside your dbt project directory, run:
    dbt run
    
    This will execute your models and create the tables or views in BigQuery.

Step 7: Check Your Work

  1. Check BigQuery to see if your tables or views have been created as expected.

Step 8: Version Control (Optional)

  1. Initialize a Git repository in your project directory:
    git init
    
  2. Add and commit your files:
    git add .
    git commit -m "Initial commit"
    

Final Tips

  • Always test your models by running dbt run regularly to ensure everything is working.
  • Use dbt docs to generate documentation with dbt docs generate.

That's it! You've set up a dbt project for BigQuery using Visual Studio Code. Happy developing!