Explore Custom Snapshotting in dbt: A Guide to Capture Historical Data

4 min readJun 28, 2023

Custom snapshotting is a powerful feature in dbt that allows you to capture and store point-in-time snapshots of your data. It provides the ability to create historical analysis or track changes over time, enabling you to perform advanced analytics and gain insights into data trends.

In this blog, we will explore custom snapshotting in dbt in detail, including its importance, how to implement it, and provide code snippets and examples to illustrate its usage.

Let’s start

1. Why is Custom Snapshotting Important?

Custom snapshotting is important for several reasons:

Historical Analysis: Custom snapshots enable you to analyze historical data and perform trend analysis. By capturing data at different points in time, you can track changes, identify patterns, and make informed decisions based on historical trends.

Data Auditing: Snapshots provide a way to audit data changes and track data lineage. You can easily trace back to a specific snapshot to understand the state of the data at a given time, ensuring data integrity and facilitating compliance requirements.

Versioning: Snapshots serve as a versioning mechanism for your data. If you make changes to your data model or transformation logic, you can compare snapshots to understand the impact of those changes and revert back if needed.

Time-Travel Queries: With snapshots, you can perform time-travel queries by querying the data as it existed at a specific point in time. This is useful for historical reporting, debugging, or investigating data-related issues.

2. Implementing Custom Snapshotting in dbt

Implementing custom snapshotting in dbt involves the following steps:

Step 1 — Define Snapshot Tables: Start by defining snapshot tables in your data warehouse to store the snapshot data. These tables should have the necessary columns to capture the relevant information, such as the timestamp, primary key, and the attributes you want to snapshot.

Step 2 — Create a Snapshot Model: In dbt, create a snapshot model that defines the logic for capturing and storing the snapshot data. This model should query the source tables and insert the snapshot data into the snapshot tables. You can use dbt’s SQL and Jinja templating to define the logic.

Step 3 — Schedule Snapshot Runs: Configure dbt to schedule the snapshot model runs at the desired intervals. This ensures that snapshots are created periodically, capturing the data changes over time.

Step 4 — Querying Snapshots: Once the snapshots are created, you can query them like any other table in your data warehouse. You can perform analysis, join with other tables, or apply filters based on the snapshot timestamp.

3. Code Snippets and Examples

Let’s illustrate the process of custom snapshotting in dbt with a practical example. Suppose we have an e-commerce database with a “products” table, and we want to create snapshots of the product inventory to track changes over time.

Step 1 — Define Snapshot Tables

Create a snapshot table called “product_snapshots” with the following columns:

CREATE TABLE product_snapshots (
  snapshot_timestamp TIMESTAMP,
  product_id INT,
  product_name VARCHAR,
  inventory_count INT
);

Step 2 — Create a Snapshot Model

Create a dbt model called “product_snapshot_model” that defines the logic for capturing the snapshot data:

-- product_snapshot_model.sql
{{ config(
    materialized='snapshot',
    unique_key='product_id, snapshot_timestamp'
) }}

SELECT
  CURRENT_TIMESTAMP() AS snapshot_timestamp,
  product_id,
  product_name,
  inventory_count
FROM
  {{ ref('products') }}

Step 3 — Schedule Snapshot Runs

In your dbt project configuration file, define a schedule for the snapshot model:

# dbt_project.yml
...
models:
  product_snapshot_model:
    +materialized: snapshot
    +unique_key: product_id, snapshot_timestamp
    +tags:
      - snapshot
    +config:
      +enabled: true
      +materialized: snapshot
    +tests:
      - snapshot_data_changed
      - snapshot_data_unexpected
    snapshots:
      - name: product_snapshots
        strategy: timestamp
        updated_at: snapshot_timestamp
        columns:
          - snapshot_timestamp
          - product_id
          - product_name
          - inventory_count
        unique_key: product_id, snapshot_timestamp
        check_cols: [product_name, inventory_count]
        tags: [snapshot]
        tests:
          - snapshot_data_changed
          - snapshot_data_unexpected
        schema: snapshots
        sort: true
    +snapshot_strategy: timestamp
    +snapshot_columns:
      - snapshot_timestamp
      - product_id
      - product_name
      - inventory_count
    +snapshot_unique_key: product_id, snapshot_timestamp
...

Step 4 — Querying Snapshots

Once the snapshot model runs, you can query the “product_snapshots” table to analyze the historical data:

SELECT *
FROM product_snapshots
WHERE snapshot_timestamp >= '2022-01-01'

This query will retrieve all the snapshots of the product inventory captured after January 1, 2022.

So, this is how you can retrieve the historical data using snapshots

Custom snapshotting is a valuable feature in dbt that allows you to capture and store point-in-time snapshots of your data. It enables historical analysis, data auditing, versioning, and time-travel queries.

By following the steps outlined above and utilizing code snippets and examples, you can implement custom snapshotting in dbt and leverage its benefits to gain deeper insights into your data and make informed decisions based on historical trends.

Happy Snapshotting !

Explore my other articles on dbt series:

Unleashing the Power of dbt: Advanced Techniques and Features

dbt (Data Build Tool) is an open-source command-line tool for modern data transformation and engineering. It allows…

medium.com

dbt Beyond the Basics: Advanced Strategies for Implementation

Advanced dbt topics are useful for data professionals who are looking to enhance their data modeling and analytics…

medium.com

Venturing into Complex Techniques with dbt: Dynamic Data Selection, Lineage, and Governance

DBT (Data Build Tool) is an open-source command-line tool that helps data analysts and engineers transform and manage…

medium.com

Thanks for Reading !

I post about Data , AI , Startups , Leadership, Writing & Culture.

Stay Tuned for my next blog !!

Explore Custom Snapshotting in dbt: A Guide to Capture Historical Data

1. Why is Custom Snapshotting Important?

2. Implementing Custom Snapshotting in dbt

3. Code Snippets and Examples

Unleashing the Power of dbt: Advanced Techniques and Features

dbt (Data Build Tool) is an open-source command-line tool for modern data transformation and engineering. It allows…

dbt Beyond the Basics: Advanced Strategies for Implementation

Advanced dbt topics are useful for data professionals who are looking to enhance their data modeling and analytics…

Venturing into Complex Techniques with dbt: Dynamic Data Selection, Lineage, and Governance

DBT (Data Build Tool) is an open-source command-line tool that helps data analysts and engineers transform and manage…

Written by Rasiksuhail

Responses (1)