-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable running OGE pipeline with Early Release data and PUDL nightly builds #390
Conversation
src/oge/data_pipeline.py
Outdated
@@ -142,6 +142,7 @@ def main(args): | |||
logger.info("1. Downloading data") | |||
# PUDL | |||
download_data.download_pudl_data(source="aws") | |||
logger.info(f"Using {os.getenv('PUDL_BUILD')} PUDL build") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to handle the case where PUDL_BUILD
is not set:
logger.info(f"Using {os.getenv('PUDL_BUILD', 'stable')} PUDL build")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
@@ -107,56 +108,77 @@ def download_pudl_data(source: str = "aws"): | |||
Args: | |||
source (str, optional): where to download pudl from, either 'aws' or 'zenodo'. | |||
Defaults to 'aws'. | |||
build (str): whether to download the "stable" or "nightly" build | |||
|
|||
Raises: | |||
ValueError: if `source` is neither 'aws' or 'zenodo'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add the UserWarning
to the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Purpose
In order to access hourly OGE data for 2023, this PR updates the pipeline to access and run EIA early release data.
NOTE: this PR does nothing to check or address data quality warnings raised for early release data, but just makes it available. Any data produced for an early release year should be considered "use at your own risk" for now.
This also updates the pipeline to work with the most recent stable pudl release: v2024.8.0 (see https://catalystcoop-pudl.readthedocs.io/en/stable/release_notes.html#v2024-8-0-2024-08-19). This release mostly just adds new data, but also fixes some of the issues with missing/inconsistent generator operating dates, and changes the pudl.sqlite file from a gzip to a zip.
What the code is doing
We add a new constant
current_early_release_year
which defines the year that early release data is available for.When I wrote this code last week, the early release data was only available through the nightly build data release of pudl, rather than the stable build, so I have updated the data download function to download the nightly build data if specified by a new environment variable
PUDL_BUILD
, and also access the pudl database from the correct location (pudl databases are now either saved inpudl/stable
orpudl/nightly
.Testing
Running the pipeline for 2023 worked without any hard errors. There were a lot more warnings about incomplete data, which is expected.
Review estimate
10-15 min
Future work
Actually validate the early release outputs to ensure they are as complete as possible.
Checklist
black