Summary

The dbt project has become overwhelmingly popular across analytics and data engineering teams. While it is easy to adopt, there are many potential pitfalls. Dustin Dorsey and Cameron Cyr co-authored a practical guide to building your dbt project. In this episode they share their hard-won wisdom about how to build and scale your dbt projects.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management
Data projects are notoriously complex. With multiple stakeholders to manage across varying backgrounds and toolchains even simple reports can become unwieldy to maintain. Miro is your single pane of glass where everyone can discover, track, and collaborate on your organization's data. I especially like the ability to combine your technical diagrams with data documentation and dependency mapping, allowing your data engineers and data consumers to communicate seamlessly about your projects. Find simplicity in your most complex projects with Miro. Your first three Miro boards are free when you sign up today at dataengineeringpodcast.com/miro. That’s three free boards at dataengineeringpodcast.com/miro.
Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack
You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free!
Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.
Your host is Tobias Macey and today I'm interviewing Dustin Dorsey and Cameron Cyr about how to design your dbt projects

Interview

Introduction
How did you get involved in the area of data management?
What was your path to adoption of dbt?
- What did you use prior to its existence?
- When/why/how did you start using it?
What are some of the common challenges that teams experience when getting started with dbt?
- How does prior experience in analytics and/or software engineering impact those outcomes?
You recently wrote a book to give a crash course in best practices for dbt. What motivated you to invest that time and effort?
- What new lessons did you learn about dbt in the process of writing the book?
The introduction of dbt is largely responsible for catalyzing the growth of "analytics engineering". As practitioners in the space, what do you see as the net result of that trend?
- What are the lessons that we all need to invest in independent of the tool?
For someone starting a new dbt project today, can you talk through the decisions that will be most critical for ensuring future success?
As dbt projects scale, what are the elements of technical debt that are most likely to slow down engineers?
- What are the capabilities in the dbt framework that can be used to mitigate the effects of that debt?
- What tools or processes outside of dbt can help alleviate the incidental complexity of a large dbt project?
What are the most interesting, innovative, or unexpected ways that you have seen dbt used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working with dbt? (as engineers and/or as autors)
What is on your personal wish-list for the future of dbt (or its competition?)?

Contact Info

Dustin
- LinkedIn
Cameron
- LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Sponsored By:

Support Data Engineering Podcast

Streaming Data Into The Lakehouse With Iceberg And Trino At Going

An Opinionated Look At End-to-end Code Only Analytical Workflows With Bruin

Feldera: Bridging Batch and Streaming with Incremental Computation

Accelerate Migration Of Your Data Warehouse with Datafold's AI Powered Migration Agent

Bring Vector Search And Storage To The Data Lake With Lance

The Role of Python in Shaping the Future of Data Platforms with DLT

Build Your Data Transformations Faster And Safer With SDF

Scaling Airbyte: Challenges and Milestones on the Road to 1.0

Enhancing Data Accessibility and Governance with Gravitino

The Evolution of DataOps: Insights from DataKitchen's CEO

Achieving Data Reliability: The Role of Data Contracts in Modern Data Management

How Generative AI Is Impacting Data Engineering Teams

The Role of Product Managers in Data-Centric Organizations

Neon: A Serverless And Developer Friendly Postgres

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Stitching Together Enterprise Analytics With Microsoft Fabric

Being Data Driven At Stripe With Trino And Iceberg

X-Ray Vision For Your Flink Stream Processing With Datorios

Practical First Steps In Data Governance For Long Term Success

Data Migration Strategies For Large Scale Systems

Zenlytic Is Building You A Better Coworker With AI Agents

Release Management For Data Platform Services And Logic

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Build Your Second Brain One Piece At A Time

Making Email Better With AI At Shortwave

Designing A Non-Relational Database Engine

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Reconciling The Data In Your Databases With Datafold

Version Your Data Lakehouse Like Your Software With Nessie

When And How To Conduct An AI Program

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Sharing Across Business And Platform Boundaries

Tackling Real Time Streaming Data With SQL Using RisingWave

Build A Data Lake For Your Security Logs With Scanner

Modern Customer Data Platform Principles

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Designing Data Platforms For Fintech Companies

Troubleshooting Kafka In Production

Adding An Easy Mode For The Modern Data Stack With 5X

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Designing Data Transfer Systems That Scale

Addressing The Challenges Of Component Integration In Data Platform Architectures

Unlocking Your dbt Projects With Practical Advice For Practitioners

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Shining Some Light In The Black Box Of PostgreSQL Performance

Surveying The Market Of Database Products

Defining A Strategy For Your Data Products