-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow configurable include_policy for snowflake #1603
Comments
@drewbanin - I'm happy to have a crack at this, I've been meaning to contribute for a while. Is it going to be easier to do this specifically for snowflake, or something common across adapters? In either case, can you point me at where the meat of the configurable |
Cool @alanmcruickshank - that sounds awesome! I started digging into the code here -- it certainly does look like you'll want to change I'm afraid I don't even have a good place to point you -- this logic is currently baked way into a lot of different places in the dbt codebase. Happy to investigate, but also wanted to give you something helpful before the end of the week: One terrible, but possibly interesting approach could be to make your own wrapper around {% macro relative_ref(model_name) %}
{% set rel = ref(model_name) %}
{% if rel is string %}
{# This is an ephemeral model #}
{% do return(rel) %}
{% else %}
{% set new_rel = rel.include(database=False) %}
{% do return(new_rel) %}
{% endif %}
{% endmacro %} The follow example shows that select * from {{ relative_ref('stg_snowplow_events') }}
union all
select * from {{ ref('stg_snowplow_events') }}
-- compiles to:
select * from analytics.stg_snowplow_events
union all
select * from analytics.analytics.stg_snowplow_events So, I really want to be clear that this isn't a good idea, but it's one idea that came to mind in thinking about how this could work. In playing around with this code, I recognized a distinction that I wasn't considering before. The
In this case, it doesn't sound like there's any problem including the database name when the relations are created, right? Your use-case requires that the database is excluded when models are referenced. Since all model references go through the I can think of other types of cool things we could do here, like hacking |
@drewbanin - I love the ideas! I'd love to have confidence that whatever I actually do is having the effect that I'd like. I notice that there's no unit tests in place that directly test the I'm assuming they'd go in the |
I think that's reasonable! What are you actually planning on implementing? I wrote this up on a Friday afternoon, so I'd also love to get @beckjake's thoughts on the topic too :) |
I am always in favor of more unit tests! We do test it pretty heavily implicitly via our integration suite, but that's never enough. My instinct is that that the right approach here is to alter how The big problem is that we don't use class RelationProxy
def __init__(self, adapter, model):
self.adapter = adapter
self.model = model
def create(self, *args, **kwargs):
kwargs['quote_policy'] = dbt.utils.merge(
self.quoting_config,
kwargs.pop('quote_policy', {})
)
result = self.relation_type.create(*args, **kwargs)
if result.identifier == self.model.alias:
# handle adapters that must default to include_policy['database'] is False
result = result.include(**result.DEFAULTS['include_policy'])
return result Which, ugh, I mean it'll work but you also have to pass model in to the DBWrapper, and that sure feels like a lot of special-ness! |
Great question. I've got two trains of thought, and I think I prefer the first one over the second one - but open to implementing the other (or both while I'm at it if you prefer).
I'm leaning much more toward the first idea - what do you guys think? Not sure if all the config levels are necessary. Maybe just the schema level config, with a |
I am amenable to the idea of a I definitely would rather not have it be available in the root of the project config. If you look at how dbt handles quoting, pushing that stuff up from the base project level is pretty convoluted and painful and involves changing a lot. But it's really easy to do it from the schema level, I think you'd just add it to the SourceConfig class and change I think So when choosing between those two, the question really is: who knows that the ref should be relative to the call, the referrer or the referent? I think it's probably the referrer, but I would defer to your expertise/needs on this! |
I want to try and get a rendering context so that I can test that rendering in the unit tests. The function I think I want to test is @beckjake - is there a way of getting a rendering context so that I can test rendering without requiring an actual database connection (or a way of mocking enough things to achieve the same result)? Or am I barking up the wrong tree and should I just resign myself to having to test this as an integration test? |
I don't know if there's a good way, this is why we end up with a lot of integration tests, unfortunately. You could use A better path, though still a lot of work, might be to use But also, if that seems hard or you get stuck, don't feel bad at all about making it an integration test. |
Hey guys - I've finally got the compiler into a test harness so that I can make sure I don't break anything in the process. 😃 I've also got some more detail on the plan of what I'll implement. On the schema config, there will be a config option called Within each model the This solves my use case, but is it worth having something more controllable? My original suggestion had the idea of being able to specify an |
Continuing the discussion from #1624 , which is now just the unit tests. @drewbanin - I've taken a look at the macro framework, and I've hit a roadblock - which I'm not sure how to get past. Patching {% macro ref(model) %}
{{ log("Patched Ref!", True) }}
{{ ref(model) }}
{% endmacro %} But that inner
Both feel a bit hacky, I'm leaning slightly toward the first one because I don't know how we'd achieve the latter. Any ideas? |
Yeah - I think we'll need to preserve the dbt builtin macros in the context. I think this is as easy as adding a line like:
somewhere around here So, that solves the masking issue - the The bigger problem exists around packages: The macros that a package can access are limited to:
So, we'd in fact need to make I'm not exactly certain where that code should live (in It's conceivable that users wouldn't actually implement This feels kind of roundabout, but I think it's a more general solution to this class of problem. We have some other issues like #1612 and #1592 which could also be addressable using the approach outlined here. You buy this? |
Yep I buy this. I think you put it quite nicely in the details for #1592.
I'm totally bought into being able to extend the I wonder if there's a different way of looking at this problem? Rather than override the Let's take two of the use cases we've been discussing. 1) relative refs, and 2) the functionality from #1612 where we might point to a different root schema in development. If we wanted to do both, and we just overrode the
In specifying it like that, we're clear that first we apply the relative ref, and then override that with the dev references if appropriate. We are explicit about overrides. I think there could be a macro equivalent of this... That also fits nicely with your ideas I think, that a |
Sure, I think that overriding the Unsure about the macro chaining solution though: I don't quite understand how So, I think one macro is probably appropriate here -- users can always make multiple helper macros, then chain them together themselves as appropriate. Let's not try to tackle all of these things at once: I'm happy to start by solving the problem stated here (make it possible to render relation names without a database/schema). We can build on top of this in future iterations for sure. I think a good way to proceed is:
I just did barely enough work here to assess the feasibility of this plan, so please let me know if i missed anything glaring! |
closed by #2028 |
Feature
Feature description
Copied from a discussion between @alanmcruickshank and @drewbanin on slack
Question
We've got an issue with deploying our DBT models in snowflake. In short we build in one
database
(a staging database) and then atomically rename with a livedatabase
. That means users never see an interruption in the things they're querying because it's an atomic rename (ALTER DATABASE ... SWAP WITH
).The problem we have is that the
SWAP WITH
command is not swapping everything tidily. All the tables (materialised views) swap fine (:+1: :slightly_smiling_face: ) but the views still refer to thestaging
database (:thumbsdown: :disappointed: ). I thought this was a problem with snowflake so I've been talking to their support, but I think it might actually be about howdbt
:dbt: executes build commands on snowflake.Example 1 (of one way of doing what dbt is doing, and getting the result I was hoping for)
Example 2 (of the other way of doing what dbt is doing, and getting the result I'm actually getting)
From the results I'm getting it looks like
dbt
:dbt: is doing the latter, which makes total sense if you're potentially deploying to multiple databases and want to be specific. It makes life really hard though in the case that we're trying to use dbt in to deploy things, where we'd actively like it to not specify the database at build time.I had a look at the source, and I wonder if this could be an option for the snowflake connector. I got lost in the source code around
include_policy
, which looks like it might be relevant.Has anyone else solved this problem? ... or if not @drewbanin banin I'm happy to do some leg work to get this feature in, but I might need some pointers on exactly how the object names are resolved.
Suggestion
You’re totally right to identify that
include_policy
is probably the right place to look in the codebaseSomebody, somewhere, asked about this exact use case very recently but I can’t seem to find it 😞 Will follow up here when I do!
We made a change to dbt in the 0.13.x releases which began interpolating database names when models were
ref
erenced. This is indeed controlled by theinclude_policy
, and is intended to make the destination database for models configurable. So, in recent versions of dbt, you should be able to run:This will configure the model to be built into the specified database. Further, models that reference this model will look for it in the right place!
I’d like to continue to support this use case, but I also totally buy the merits of the use case that you’re describing. So, maybe instead of removing the database from the
include_policy
, we should expose a configuration similar to what we do forquoting
. Unsure if you’re familiar, but thequoting:
config indbt_project.yml
overrides the defaultquote_policy
for relations. I think that by settinginclude_policy.database
toFalse
, dbt should do pretty much exactly what you need here. There might be tiny other changes to make, but this is going to be the place to start I think!If you don’t mind creating an issue, I’d be happy to follow up with some pointers to the relevant lines of code 🙂
Who will this benefit?
This will benefit anyone doing blue/green style in snowflake. In particular people with large deployments with a mix of tables and views.
The text was updated successfully, but these errors were encountered: