Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable running OGE pipeline with Early Release data and PUDL nightly builds #390

Merged
merged 8 commits into from
Sep 20, 2024

Conversation

grgmiller
Copy link
Collaborator

@grgmiller grgmiller commented Aug 20, 2024

Purpose

In order to access hourly OGE data for 2023, this PR updates the pipeline to access and run EIA early release data.

NOTE: this PR does nothing to check or address data quality warnings raised for early release data, but just makes it available. Any data produced for an early release year should be considered "use at your own risk" for now.

This also updates the pipeline to work with the most recent stable pudl release: v2024.8.0 (see https://catalystcoop-pudl.readthedocs.io/en/stable/release_notes.html#v2024-8-0-2024-08-19). This release mostly just adds new data, but also fixes some of the issues with missing/inconsistent generator operating dates, and changes the pudl.sqlite file from a gzip to a zip.

What the code is doing

We add a new constant current_early_release_year which defines the year that early release data is available for.

When I wrote this code last week, the early release data was only available through the nightly build data release of pudl, rather than the stable build, so I have updated the data download function to download the nightly build data if specified by a new environment variable PUDL_BUILD, and also access the pudl database from the correct location (pudl databases are now either saved in pudl/stable or pudl/nightly.

Testing

Running the pipeline for 2023 worked without any hard errors. There were a lot more warnings about incomplete data, which is expected.

Review estimate

10-15 min

Future work

Actually validate the early release outputs to ensure they are as complete as possible.

Checklist

  • Update the documentation to reflect changes made in this PR
  • Format all updated python files using black
  • Clear outputs from all notebooks modified
  • Add docstrings and type hints to any new functions created

Base automatically changed from greg/plant_55641 to development August 21, 2024 16:10
@grgmiller grgmiller changed the title Enable running OGE pipeline with Early Release data Enable running OGE pipeline with Early Release data and PUDL nightly builds Aug 21, 2024
@grgmiller grgmiller marked this pull request as ready for review August 21, 2024 20:41
@grgmiller grgmiller requested review from rouille and gailin-p August 21, 2024 20:41
@@ -142,6 +142,7 @@ def main(args):
logger.info("1. Downloading data")
# PUDL
download_data.download_pudl_data(source="aws")
logger.info(f"Using {os.getenv('PUDL_BUILD')} PUDL build")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want to handle the case where PUDL_BUILD is not set:

logger.info(f"Using {os.getenv('PUDL_BUILD', 'stable')} PUDL build")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -107,56 +108,77 @@ def download_pudl_data(source: str = "aws"):
Args:
source (str, optional): where to download pudl from, either 'aws' or 'zenodo'.
Defaults to 'aws'.
build (str): whether to download the "stable" or "nightly" build

Raises:
ValueError: if `source` is neither 'aws' or 'zenodo'.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add the UserWarning to the docstring.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Collaborator

@rouille rouille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@grgmiller grgmiller merged commit e8bfef7 into development Sep 20, 2024
2 checks passed
@grgmiller grgmiller deleted the greg/2023 branch September 20, 2024 17:00
@grgmiller grgmiller mentioned this pull request Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants