Exploring about Macros & Ref in dbt

Rasiksuhail
5 min readApr 6, 2023

--

dbt

dbt (Data Build Tool) is an open-source Python application that uses modular SQL queries to allow data engineers and analysts to transform data in their warehouses. It takes care of the ‘T’ in ETL procedures and handles the data engineering that comes before analytics. It does not handle the extraction or loading of data and assumes that the raw data is already loaded into the data warehouse.

In this article, we will be covering some of the features available in dbt such as macros, ref , sources, jinja templates and so on. If you want to learn basics of dbt, check out my other post ‘dbt

Ok, let’s start with macros

Macros in dbt are reusable chunks of SQL code that can be defined in a separate file and then called from within your dbt models. Macros can help to simplify complex SQL queries, automate repetitive tasks, and enforce best practices across your dbt project.

To create a macro in dbt, you would typically create a file in your project’s “macros” directory with the extension “.sql”. Inside the file, you would define your macro using standard SQL syntax, and then save the file.

For example, here’s a simple macro that adds a prefix to a given string:

{% macro add_prefix(str, prefix) %}
SELECT '{{ prefix }}' || {{ str }} AS prefixed_str
{% endmacro %}

In this example, the macro is called “add_prefix” and takes two parameters: “str” and “prefix”. The macro then returns a SQL query that concatenates the prefix with the given string.

To use the macro in your dbt models, you would simply call it like this:

{{ macros.add_prefix('my_string', 'prefix_') }}

This would generate the following SQL code:

SELECT 'prefix_' || 'my_string' AS prefixed_str

Now let’s move on to ref.

In dbt, “ref” stands for “reference”. A ref is a way to reference another dbt model in your project. You can think of a ref as a shortcut to the SQL code that defines the referenced model.

For example, let’s say you have a model called “orders” that contains information about customer orders, and another model called “customers” that contains information about customers. You might want to join these two models together to create a new model that contains customer order information with customer details included.

To do this, you could use the “ref” function to reference the “orders” and “customers” models and join them together:

{{ 
join(
table='{{ ref('orders') }}',
on='{{ ref('customers').id }} = {{ ref('orders').customer_id }}'
)
}}

n this example, the “ref” function is used to reference the “orders” and “customers” models, and then the “join” function is used to join them together based on the customer ID.

Using “ref” functions can help to simplify complex SQL queries and make it easier to maintain your dbt project. It also allows you to easily reuse code across multiple models, which can save time and reduce duplication.

You have learnt macros & ref ! let’s move on to another feature in dbt: “sources”.

In dbt, a “source” is a connection to an external data system, such as a database or a file. Once you have defined a source in dbt, you can use it to reference tables or files in that system and create dbt models based on them.

To define a source in dbt, you would typically create a file in your project’s “sources” directory with the extension “.yml”. Inside the file, you would define the connection information for the external system, such as the database name, the host, and the credentials.

Here’s an example of a source definition for a PostgreSQL database:

sources:
- name: my_postgres_database
database: my_database
host: my_host
port: 5432
schema: public
user: my_username
password: my_password
type: postgres

n this example, the source is named “my_postgres_database” and connects to a PostgreSQL database with the name “my_database” on the host “my_host”. The schema is set to “public” and the credentials are provided via the “user” and “password” fields.

Once you have defined a source, you can use it to reference tables or files in that system in your dbt models. For example, you could create a model that references a table in the “my_postgres_database” source like this:

{{ 
config(
materialized='table'
)
}}
SELECT *
FROM {{ source('my_postgres_database', 'my_table') }}

In this example, the “source” function is used to reference the “my_table” table in the “my_postgres_database” source. This allows you to create a dbt model based on the data in the external system.

Using sources in dbt can help to simplify your data pipelines and make it easier to manage data from multiple sources. It also allows you to easily incorporate external data into your dbt project, which can be useful for tasks like data integration or data migration.

Cool ! You have come this far. Lets learn Jinja templates.

inja is a powerful templating engine that is integrated into dbt. With Jinja, you can write dynamic SQL code that can be customized based on variables or inputs. This allows you to create more flexible and reusable data pipelines.

For example, suppose you have a data model that needs to join a table based on a specific date range. You could use Jinja to create a dynamic SQL statement that incorporates that date range. Here’s an example:

{% set date_range = (ref('my_date_range_model')).date_range %}
select *
from my_table
where date >= '{{ date_range.start }}' and date <= '{{ date_range.end }}'

In this example, the date_range variable is set based on a my_date_range_model that has been defined elsewhere in the dbt project. This allows you to create a reusable date range that can be used across multiple models.

Overall, Jinja templates is a powerful features in dbt that allow you to create more flexible and reusable models. By using these features, you can create data models that are more scalable, maintainable, and efficient.

Kudos ! You have unlocked the skills of macros ,ref, sources & jinja !

Will be writing few more articles with respect to dbt !! Stay tuned !!

Till then get your hands dirty on dbt :-)

Do Check out my other articles

--

--

Responses (1)