-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for setting policy tags for columns in BigQuery #2589
Added support for setting policy tags for columns in BigQuery #2589
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick first pass here. Looks like there's one question to answer still about the specifics of the column-level config. Overall, this looks very good, and I appreciate you taking the time on this PR! The tests look great :)
@@ -17,7 +17,7 @@ decorator==4.4.2 | |||
docutils==0.15.2 | |||
google-api-core==1.16.0 | |||
google-auth==1.16.1 | |||
google-cloud-bigquery==1.24.0 | |||
google-cloud-bigquery==1.25.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does 1.25 specifically add support for policy tags? We had some issues with recent releases of a couple of Google libraries, so I just wanted to double check that bumping this version number is required for this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @drewbanin, support for policy tags was added in 1.25.0 which is why I updated the version
https://googleapis.dev/python/bigquery/latest/changelog.html
@@ -610,6 +610,10 @@ def update_column_descriptions(self, relation, columns): | |||
column_config = columns[column.name] | |||
column_dict = column.to_api_repr() | |||
column_dict['description'] = column_config.get('description') | |||
if column_config.get('tags'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not use tags
directly -- can we instead create a new config called policy_tags
and use it here? That might involve reaching into the Column
contracts defined in dbt.
Alternatively, I do like the idea of repurposing a property in the meta
dict for this config. I know @jtcohen6 voiced his opposition because meta is intended to be purely informational, which is correct, but I'm looking through the code now and I think adding a new column-level property specific to BigQuery might end up being a can of worms!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree with @jtcohen6 that meta
should stay informational/user-controlled.
I think we are going to have to deal with the idea of adapter-specific column fields (and adapter-specific schema.yml entries), much like we did with config
. Maybe as a first pass, we could change UnparsedColumn
(or even its ancestor class HasDocs
) to derive from ExtensibleJsonSchemaMixin
so it can get arbitrary keys? I haven't really thought about the consequences of that at all, but it's what I'd look at first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm supportive of core's classes (HasDocs
, ColumnInfo
) accepting arbitrary keys, as long as there's some way for plugin authors to enumerate those additional keys, similar to AdapterSpecificConfigs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If having HasDocs
derive from ExtensibleJsonSchemaMixin
is the correct solution here, any guidance on how to setup an AdapterSpecificConfigs
equivalent for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @beckjake, I'm more unsure about how to setup the equivalent of a AdapterSpecificColumnInfo
so that I can pass use a BigQueryColumnInfo
with policy_tags
property here but elsewhere it defaults to ColumnInfo
.
I know inside BigQueryAdapter
we set Column = BigQueryColumn
, I'm assuming we'd want to do something similar for the ColumnInfo but not sure how to ensure that the passed in ColumnInfo gets used elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you just want to make ColumnInfo
itself allow additional properties, and then the bigquery code that wants it can extract them out if it needs them. Making that aspect of parsing itself adapter-specific would be quite an overhaul.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying @beckjake. Out of curiosity, what's the thinking behind creating the new AdditionalPropertiesMixin
class vs. using AdditionalPropertiesAllowed
and removing the default value for _extra
to get around the TypeError: non-default argument ... follows default argument
error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this answer is kind of circular reasoning, but: Because I want to have a default value for _extra
! It's convenient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, okay updating accordingly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple notes, really they're more about the passage of time than actual changes. But this looks awesome, thanks for contributing another great bigquery feature!
core/dbt/contracts/graph/unparsed.py
Outdated
@@ -66,13 +66,45 @@ class Docs(JsonSchemaMixin, Replaceable): | |||
show: bool = True | |||
|
|||
|
|||
class AdditionalPropertiesMixin: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this actually exists in dev/marian-anderson
now (sorry, I've been busy since you opened this and should've clarified!) if you rebase/merge in, you can import it from dbt.contracts.util
.
CHANGELOG.md
Outdated
- Added support for setting policy tags for BigQuery columns ([#2586](/~https://github.com/fishtown-analytics/dbt/issues/2586), [#2589](/~https://github.com/fishtown-analytics/dbt/pull/2589)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you push this up into the top when you rebase/merge? If we do a release in the meantime (likely), I'll fix it myself! b1 shipped a while ago
CHANGELOG.md
Outdated
@@ -42,7 +43,7 @@ Contributors: | |||
- [@alf-mindshift](/~https://github.com/alf-mindshift) ([#2431](/~https://github.com/fishtown-analytics/dbt/pull/2431)) | |||
- [@scarrucciu](/~https://github.com/scarrucciu) ([#2508](/~https://github.com/fishtown-analytics/dbt/pull/2508)) | |||
- [@southpolemonkey](/~https://github.com/southpolemonkey) ([#2511](/~https://github.com/fishtown-analytics/dbt/issues/2511)) | |||
- [@azhard](/~https://github.com/azhard) ([#2517](/~https://github.com/fishtown-analytics/dbt/pull/2517), ([#2521](/~https://github.com/fishtown-analytics/dbt/pull/2521)), [#2547](/~https://github.com/fishtown-analytics/dbt/pull/2547)) | |||
- [@azhard](/~https://github.com/azhard) ([#2517](/~https://github.com/fishtown-analytics/dbt/pull/2517), ([#2521](/~https://github.com/fishtown-analytics/dbt/pull/2521)), [#2547](/~https://github.com/fishtown-analytics/dbt/pull/2547), [#2588](/~https://github.com/fishtown-analytics/dbt/pull/2588)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this issue up into a new contributors block? I think it will actually exist if you pull in dev/marian-anderson at the moment.
Thanks so much! I'm going to get this merged in for 0.18.0b2. |
resolves #2586
Description
Enables users to set policy tags on BigQuery columns by specifying the policy tag in the resource file for their models.
Checklist
CHANGELOG.md
and added information about my change to the "dbt next" section.