-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't create bigquery dataset if it already exists #1569
Don't create bigquery dataset if it already exists #1569
Conversation
Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
/lgtm |
@@ -102,8 +102,15 @@ def get_historical_features( | |||
|
|||
assert isinstance(config.offline_store, BigQueryOfflineStoreConfig) | |||
|
|||
# We should create a new dataset if the dataset name was not overridden in the config | |||
should_create_dataset = "dataset" not in config.offline_store.__fields_set__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't 100% agree with this approach. What if the dataset already exists but its not set in the config? I think we should just manually check whether the dataset exists before we run the create command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'll require an additional bigquery.datasets.get
permission on the dataset in question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically we'll be saying that you need either bigquery.datasets.create
and bigquery.datasets.get
permissions if the dataset does not exist, or just bigquery.datasets.get
permission if it already exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So get
permissions is too much? It seems reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, its probably possible for us to get around the bigquery.datasets.get
permission by trying an operation inside the dataset and seeing if it succeeds (and which exception is thrown).
Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
Codecov Report
@@ Coverage Diff @@
## master #1569 +/- ##
==========================================
+ Coverage 77.40% 83.55% +6.15%
==========================================
Files 64 65 +1
Lines 5624 5754 +130
==========================================
+ Hits 4353 4808 +455
+ Misses 1271 946 -325
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
client.get_dataset(dataset) | ||
except NotFound: | ||
# Only create the dataset if it does not exist | ||
client.create_dataset(dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we removed exists_ok=True
? Wouldn't we introduce a race condition here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to remove it because the except block would only be executed if the dataset doesn't exist. I didn't think about race conditions. I'm adding this back.
Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tsotnet, woop The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
* Don't create bigquery dataset if dataset field is defined in config Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai> * Try getting the dataset before creating it Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai> * Remove extra client.get_dataset Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai> * Add exists_ok=True back Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
Signed-off-by: Tsotne Tabidze tsotne@tecton.ai
What this PR does / why we need it: In bigquery.py, first call
client.get_dataset
and only callclient.create_dataset
if it doesn't exist. This way users can manually create the dataset and only grantbigquery.datasets.get
permission (notbigquery.datasets.create
) to Feast.Which issue(s) this PR fixes:
Fixes #
Does this PR introduce a user-facing change?: