Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/defer to prod #2656

Merged
merged 4 commits into from
Jul 31, 2020
Merged

Feature/defer to prod #2656

merged 4 commits into from
Jul 31, 2020

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented Jul 28, 2020

resolves #2527

Description

Adds --defer and --state arguments, they behave as described in #2527

I had to update hologram and make some small tweaks to support parsing the manifest from disk. The compiled field is now mandatory for compiled nodes, and while it may exists for a short time in a False state, users should never experience a situation where compiled=False.

The code as it is replaces all unselected nodes with entries from a previous run. I'm not sure this is as intuitive as we might want!

For example, if you have seeds you ref, and you run dbt run --defer --state ./state, the seeds will still point at the current run's information, because the implicit fqn:* selector did select seeds! I think we can make some tweaks to node selection to fix this if we want, but for now I left it as-is.

Ephemeral nodes are never deferred.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Jul 28, 2020
@jtcohen6
Copy link
Contributor

This is really exciting. Pulling on two of the threads you mention above:

For example, if you have seeds you ref, and you run dbt run --defer --state ./state, the seeds will still point at the current run's information, because the implicit fqn:* selector did select seeds! I think we can make some tweaks to node selection to fix this if we want, but for now I left it as-is.

This is the expected behavior IMO, since it ensures that dbt run --defer --state ./state == dbt run. If someone wanted to defer seed files only (which would surprise me), I'd expect them to do something a little more explicit like dbt run --defer --state ./state --exclude my_one_seed my_other_seed.

Ephemeral nodes are never deferred.

To clarify my understanding: Ephemerals are passthroughs, and so any of their references to non-ephemeral, unselected models will be deferred. To that end, it doesn't matter whether the ephemeral models themselves are included or excluded from the set of selected nodes.

@beckjake
Copy link
Contributor Author

Yeah. I played around with a bit and deferring to ephemeral models really violates my expectations about defer: in particular that it only defers to the file for "how do I refer to this model's existence in the database?". That's a nonsensical question for ephemeral models: it's their contents that are important downstream, not their names!

@beckjake beckjake force-pushed the feature/defer-to-prod branch 2 times, most recently from e7c624e to f58e338 Compare July 29, 2020 21:02
@beckjake
Copy link
Contributor Author

I've rebased onto the newly-merged dev/marian-anderson and added some more tests. I'll mark this as ready for review as soon as tests are passing, but some local spot testing looked ok

@beckjake beckjake marked this pull request as ready for review July 30, 2020 14:01
@beckjake beckjake requested review from jtcohen6 and kwigley July 30, 2020 14:01
Copy link
Contributor

@kwigley kwigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I played around with our internal analytics project, and everything felt intuitive, including ephemeral model behavior. Just two small questions from me.

core/dbt/contracts/graph/manifest.py Show resolved Hide resolved
def _get_state_path(self) -> Path:
if self.args.state is not None:
return self.args.state
elif ARTIFACT_STATE_PATH is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Glad to see that ARTIFACT_STATE_PATH and --state are mutually substitutable, to support Cloud deployment.

I think the conclusion of this conversation was to add a separate environment variable for deferral as well. Let me know if you disagree.

Copy link
Contributor Author

@beckjake beckjake Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that! It's set up differently because it's a boolean rather than a value, but it's defined in dbt/flags.py. It being set/unset informs the default value for the --defer/--no-defer argument to change.

I'll unify this logic to all go in flags + main.

Jacob Beck added 4 commits July 31, 2020 09:24
- Actually set deferred=True
- add a test for that
- combine/unify environment vars vs cli vars behavior between defer/state better
  - make the state path start with DBT_
@beckjake beckjake force-pushed the feature/defer-to-prod branch from d13a25b to 2ed9764 Compare July 31, 2020 15:25
@beckjake beckjake merged commit 1fb8c9c into dev/marian-anderson Jul 31, 2020
@beckjake beckjake deleted the feature/defer-to-prod branch July 31, 2020 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Defer to prod manifest for partial runs
3 participants