-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for deference to models imported from a package. #3309
Comments
@randypitcherii and I had a very cool conversation about this offline, and I want to summarize some of what we discussed here. The words here can be a bit confusing, and the possibilities are quite exciting. Let's take as our starting point the functionality that exists today, and the use case of a large company with a Data Foundation team supporting embedded analytics teams in other business units. An analyst in an embedded teams could "import" the foundational package (
Here's the kicker: Once they've done that, they can accomplish exactly what Randy outlined above by running:
And that's it. dbt knows about the Foundation team's models because In two crucial ways, this approach is preferable to redefining the foundation package models via
This is all possible today. It's actually one of the use cases we imagined when originally shaping defer functionality (#2527). Future art?We could consider making this syntax slicker by turning It isn't currently possible to defer to / compare state against more than one manifest. If there are many foundational packages, all of which want to be imported-and-deferred-to, it would be amazing if dbt could read from multiple manifests to compare state. In the meantime, it seems like there are two reasonable options:
Lastly, would we consider formally adding a |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Describe the feature
Today, importing dbt packages works great for things that do not get materialized - so macros, custom materializations, analyses, things like that.
And even things that get materialized works well when importing models that are defined by some external package but don't already exist in your target warehouse.
However, consider the use case of a large company with a data foundation team that supports embedded analytics teams in other business units. The embedded teams cannot effectively import the models from the data foundation teams' dbt project without also rematerializing these models in the shared target warehouse.
In other words, it's not possible today to import packages with models that are already materialized in your target warehouse without rematerializing those models.
To address this, I think it'd be cool if there was a way (similar to slimCI) where imported packages could be associated with a run artifact that allows a dbt project to defer to those materialized models without rebuilding them. This is just one thought - I think there is probably a wide selection of ways to address this.
Describe alternatives you've considered
I believe there is some clever overriding you can do to the
ref
function to point to pre-configured locations when referencing an imported model. I haven't tried to make this work but I believe even with such logic, it'd be a pain to maintain information about where these models exist in the warehouse if things were to change in the imported project. This is a painful coordination problem.The most obvious alternative is to not let companies break dbt projects into separate repos unless they have entirely independent lineages. This means macros can be shared really easily but model definitions must stay isolated.
Lastly, you could redefine the materialized models as sources in the downstream project. This would break lineage documentation across the entire pipeline.
Additional context
I think this is such a hard problem. I think many other package management systems don't have to worry so much about this because they largely import functionality (like macros) rather than definitions of potentially-existing entities for the purpose of creating these entities only if they don't exist in an arbitrary location - it's really tough!
Who will this benefit?
This will bet any organization that does all of the following:
In more human language - this will benefit typically larger companies.
Are you interested in contributing this feature?
I think I'm about 2 orders of magnitude too dumb to help much here hahahaha, but of course I'd love to.
The text was updated successfully, but these errors were encountered: