Introduction to dbt — Step by Step Guide for Beginners

Rasiksuhail
7 min readApr 3, 2023

--

Data Build Tool

Are you tired of manually managing your data pipelines and spending hours debugging SQL code? Say goodbye to the drudgery and hello to dbt — the transformative data modeling and transformation framework that will revolutionize your data pipeline and supercharge your productivity!With dbt, you can easily transform, test, and document your data pipeline, all while working in a modular and collaborative environment. In this step-by-step guide, we’ll show you how to get started with dbt and take your data pipeline to the next level!

In this article, we will go through set of instructions on how to get started with dbt

Overview of what will be covered:

  1. Setup: We’ll start by installing dbt and setting up a new dbt project. This will involve creating a new project directory, setting up your database connection, and installing any necessary dependencies.
  2. Write your first models: I’ll help you write your first models using the SQL language. We’ll start with simple transformations and work our way up to more complex scenarios.
  3. Testing: We’ll learn how to test our models using the dbt test command. We’ll cover the basics of writing tests and asserting their results.
  4. Documentation: We’ll learn how to document our models using dbt’s built-in documentation features. We’ll cover how to write descriptions, add schema information, and generate HTML documentation.

To start, let’s make sure that you have dbt installed on your computer. You can check if dbt is installed by opening your terminal and running the following command

dbt --version

If dbt is installed, you should see a version number printed in the terminal. If not, follow the steps below

  • Install Python: dbt requires Python 3.6 or later. If you don’t have Python installed, you can download it from the official website: https://www.python.org/downloads/.
  • Install dbt: Once you have Python installed, you can install dbt using pip. Open a command prompt or terminal and run the following command:
pip install dbt

This will install the latest version of dbt and its dependencies.

  • Verify the installation: After installation, you can verify that dbt is installed correctly by running the following command in the command prompt or terminal:
dbt --version
  • This will display the version of dbt that you have installed.
  • Configure your database credentials: Before you can use dbt, you need to configure your database credentials. To do this, create a new file called profiles.yml in your home directory (~/.dbt on Mac/Linux or %USERPROFILE%\.dbt on Windows) with the following contents:
default:   
target: <YOUR_TARGET_NAME>
outputs: <YOUR_TARGET_NAME>:
type: <YOUR_DATABASE_TYPE>
host: <YOUR_DATABASE_HOST>
port: <YOUR_DATABASE_PORT>
user: <YOUR_DATABASE_USER>
password: <YOUR_DATABASE_PASSWORD>
database: <YOUR_DATABASE_NAME>
  • Replace <YOUR_TARGET_NAME>, <YOUR_DATABASE_TYPE>, <YOUR_DATABASE_HOST>, <YOUR_DATABASE_PORT>, <YOUR_DATABASE_USER>, <YOUR_DATABASE_PASSWORD>, and <YOUR_DATABASE_NAME> with your actual database credentials.
  • Test your connection: Once you’ve configured your database credentials, you can test your connection by running the following command in the command prompt or terminal:
dbt debug

This will verify that dbt is able to connect to your database and display information about your database configuration.

That’s it! You’re now ready to start using dbt.

The next step is to set up a new dbt project. To do this, open your terminal and create a new directory for your project:

mkdir my-dbt-project
cd my-dbt-project

Next, initialize your dbt project using the following command:

dbt init

This will create a new dbt project in the current directory, with the following directory structure:

my-dbt-project/
models/
README.md
dbt_project.yml
profiles.yml

The models directory is where you will write your dbt models. The dbt_project.yml file contains project-level configuration options, such as the project name, version, and target database. The profiles.yml file contains database connection information for your local and remote databases.

Now that you have your project set up, let’s create your first dbt model. In the models directory, create a new file called sample.sql and add the following SQL code:

SELECT
date,
SUM(sales) as total_sales
FROM my_table
GROUP BY 1

This is a simple SQL query that groups the sales column of a table called my_table by date and calculates the total sales for each date.

Next, let’s test your model. Open your terminal and run the following command:

dbt run

This will compile and run all of the models in your project, including the sample.sql model we just created. If everything is set up correctly, you should see a message in your terminal indicating that the model was compiled and run successfully.

Finally, let’s document your model. Open the models directory and create a new file called sample.md. In this file, add a description of your model, such as:

# Example Model
This model calculates the total sales by date for the `my_table` table.

Now, run the following command in your terminal to generate HTML documentation for your project:

dbt docs generate

This will generate a set of HTML files in a new directory called target. Open the target/index.html file in your browser to see the documentation for your sample model.

Great !! you’ve just created your first dbt model and documented it!

Now lets create a little more complex dbt model that joins data from multiple tables and applies more transformations.

In your models directory, create a new file called sales.sql and add the following SQL code

-- Define the source tables for our model
{{ config(
materialized='view',
schema='my_schema'
) }}
WITH orders AS (
SELECT
order_id,
customer_id,
order_date,
order_total
FROM orders
),
order_items AS (
SELECT
order_id,
product_id,
quantity,
price
FROM order_items
),

-- Join the orders and order_items tables
combined AS (
SELECT
o.order_id,
o.customer_id,
o.order_date,
oi.product_id,
oi.quantity,
oi.price,
oi.quantity * oi.price AS item_total,
o.order_total
FROM orders o
JOIN order_items oi
ON o.order_id = oi.order_id
),
-- Calculate some aggregate metrics
aggregated AS (
SELECT
customer_id,
DATE_TRUNC('week', order_date) AS week_start,
SUM(order_total) AS total_sales,
AVG(item_total) AS avg_item_price,
SUM(quantity) AS total_items
FROM combined
GROUP BY 1, 2
)
-- Define the final output of our model
SELECT
customer_id,
week_start,
total_sales,
avg_item_price,
total_items
FROM aggregated

This model joins the orders and order_items tables on the order_id column, calculates some aggregate metrics by week and customer, and outputs the results.

To test your model, run the following command in your terminal:

dbt run

This will compile and run all of the models in your project, including the sales.sql model we just created. After you run, you should see a message in your terminal indicating that the model was compiled and run successfully.

So, let’s document your model again. Open the models directory and create a new file called sales.md. In this file, add a description of your model, such as:

# Sales Model
This model joins the `orders` and `order_items` tables and calculates aggregate sales metrics by week and customer.

Now, again run the following command in your terminal to generate HTML documentation for your project:

dbt docs generate

This will regenerate the HTML documentation for your project, including your sales model documentation. Open the target/index.html file in your browser to see the updated documentation.

Wowww, you’ve just created a more complex dbt model that joins data from multiple tables and applies more transformations.

Next, we’ll focus on using dbt to test our models and ensure that they’re producing the correct results.

Assuming you have a database set up with some sample data and you’ve completed the previous tasks, let’s create a new test for the sales model we just created.

In your models directory, create a new file called sales_test.sql and add the following SQL code:

-- Define the source data for our test
{{ config(
test_relation='sales_test_data',
test_relation_schema='my_schema'
) }}
WITH test_data AS (
SELECT
1 AS customer_id,
DATE_TRUNC('week', CURRENT_DATE() - INTERVAL '1' WEEK) AS week_start,
500 AS total_sales,
50 AS avg_item_price,
10 AS total_items
UNION ALL
SELECT
2 AS customer_id,
DATE_TRUNC('week', CURRENT_DATE() - INTERVAL '1' WEEK) AS week_start,
1000 AS total_sales,
100 AS avg_item_price,
20 AS total_items
),

-- Define the expected output of our test
expected_output AS (
SELECT
customer_id,
week_start,
total_sales,
avg_item_price,
total_items
FROM test_data
),
-- Define the actual output of our model
actual_output AS (
SELECT
customer_id,
week_start,
total_sales,
avg_item_price,
total_items
FROM {{ ref('sales') }}
WHERE customer_id IN (1, 2)
)
-- Compare the expected and actual output
SELECT
'sales_model_test' AS test_case,
CASE
WHEN COUNT(*) = 0 THEN 'pass'
ELSE 'fail'
END AS test_result,
NULL AS test_output
FROM expected_output
EXCEPT
SELECT
'sales_model_test' AS test_case,
CASE
WHEN COUNT(*) = 0 THEN 'pass'
ELSE 'fail'
END AS test_result,
NULL AS test_output
FROM actual_output

This test creates some sample data for a single week and two customers, defines the expected output based on that data, runs the sales model to get the actual output, and compares the expected and actual output. If the expected and actual output match, the test passes; if they don't, the test fails.

To test your model, run the following command in your terminal:

dbt test

This will compile and run all of the tests in your project, including the sales_test.sql test we just created. If everything is set up correctly and your model is producing the correct output, you should see a message in your terminal indicating that the test passed.

Superb, you’ve just created a dbt test to ensure that your sales model is producing the correct output!

This is how you will create a model in dbt and test it out !!

Happy Learning :-)

--

--