Skip to content
This repository has been archived by the owner on Aug 8, 2024. It is now read-only.

Add CI and ability to control simulation dates #35

Merged

Conversation

dafyddstephenson
Copy link
Contributor

Sorry for another big PR, I was hoping to just work on this in the background thinking the repo cleanup (issue CWorthy-ocean/C-Star#2) would be done a lot more quickly, but I think it's now probably more appropriate to get this merged first.

Summary of changes

1. Introduction of CI:

A GitHub runner now configures the environment, installs C-Star, creates aroms_marbl_example case from a blueprint, and calls setup, build , pre_run , run, and post_run on it. The case is run for 30 model minutes.

2. Add ability to control simulation dates

start_date and end_date attributes added to Case and InputDataset objects, the latter to determine whether, e.g. a specific forcing file is needed and proceed accordingly. valid_start_date and valid_end_date attributes are also defined on Case objects, to specify the valid date range supported and check the user's provided start and end dates are compatible.time_step attribute added to ROMSComponent objects.


Full description of changes:

1. Introduction of CI:

Changes:
New files

  • .github/workflows/tests.yaml which runs...
  • cstar_ocean/tests/test_roms_marbl_example.py which runs the roms_marbl_example blueprint for 30 model minutes
  • ci/environment.yml to manage conda env on github runner
  • coverage.toml file for code coverage

2. Add ability to control simulation dates

Modifications:

  • cstar_case.py:
    • add start_date, end_date, valid_start_date, and valid_end_date attributes to Case class. These can be either datetime objects or strings, but strings are parsed by dateutil.parser to create datetime objects. The former belong to the specific Case object created by the user and are not written to or read from a blueprint yaml file, whereas the latter belong to the blueprint. If no range of valid dates is provided when creating a Case object, the user is warned. If no start and end dates are provided, the default behaviour is to set them as the earliest and latest valid dates, respectively. If these are also not provided, an error is raised.
    • Accordingly, add checks whether the user's provided start_date and end_date fit within the valid date range
    • Add start_date and end_date args to Case.from_blueprint() method signature as these are not present in the blueprint.
    • Add block to Case.from_blueprint() parsing the valid date range from the yaml file.
    • Add start_date and end_date entries to input_dataset fields in the YAML file: as we only want to fetch netcdf files whose own date range overlaps the Case's date range (e.g. we only want the january surface forcing file if we're only running for the first week of the year), these attributes help determine that. NOTE: we are here adding more metadata to the blueprint that should really be read from the dataset itself. It seems like @TomNicholas VirtualiZarr work could be relevant here?
    • (tidy up redundant code related to previous change disallowing Components to use lists of AdditionalCode objects, L600)
    • Extend Case.check_is_setup() method to only check for the local presence of input datasets whose date range overlaps the Case's date range
    • Extend Case.setup() method to only call InputDataset.get(self.caseroot) if an input dataset's date range overlaps the Case's date range
    • Add some logic to Case.run() to provide the n_time_steps argument to ROMSComponent.run() based on the time step and date range.
  • component.py :
    • add time_step attribute to ROMSComponent subclass
    • add conditional logic on ROMSComponent.pre_run() to only partition input datasets that exist locally (based on new behaviour in input_dataset.py where input datasets are only fetched locally if they overlap the user's chosen simulation dates)
    • Change use of InputDataset.local_path in ROMSComponent.pre_run() to reflect changes in input_dataset where the local path attribute now provides the full path to the file, not just its parent directory
    • Add steps to ROMSComponent.pre_run() to modify the ROMS runtime namelist. As ROMS opens all input datasets specified in the namelist on init, we need to dynamically modify the namelist to only describe datasets for this case that exist locally (i.e. ones which overlap the simulation period). This is done by including a "template" namelist with placeholder strings, e.g. __INITIAL_CONDITION_FILE_PLACEHOLDER__ that are replaced during pre_run() with the correct path to the IC file by C-Star.
    • add n_time_steps argument to ROMSComponent.run(). The argument is optional, and defaults to 1, but the user will be warned if this behaviour is invoked.
    • Add a step to ROMSComponent.run() to modify the namelist, adding the number of timesteps by replacing the placeholder string __NTIMES_PLACEHOLDER__
  • input_dataset.py:
    • Add start_date and end_date attributes to InputDataset class (describing for example the earliest and latest entries in a forcing file). Can be datetime or str, but strings are parsed into datetimes by dateutil.parser
    • Change InputDataset.local_path attribute to reflect full path to dataset file, not just parent directory, consistently with other path-related attributes elsewhere in C-Star

“Dafydd and others added 20 commits July 24, 2024 18:09
include coverage badge
- Case now has a "start_date" and "end_date" attribute specified at initialisation, even when reading from a blueprint file (the user's desired run dates are not relevant to the blueprint)
- Case now has "valid_start_date" and "valid_end_date" attributes which /are/ defined as part of the blueprint: this is the full range of dates supported by the blueprint
- When instantiating a case, the user's start and end dates are checked against the valid start and end dates
- InputDataset now has an optional "start_date" and "end_date" attribute/yaml entry to avoid downloading unnecessary files (e.g. forcing files outside the user's desired date range) during Case.setup
Still to do:
- Modify the runtime namelists using these start and end dates on call to .run() so the simulation actually behaves consistently with the Case it is represented by

Other:
- Bugfix where ruff was removing an "unused" import of cstar_local_config in environment.py: have added a noqa comment to prevent this for now
- Blueprint discretization entry now has a "time_step" field from which the number of time steps to run with is calculated
- ROMSComponent.pre_run() now modifies a template namelist, replacing a series of placeholder strings with paths only to any files needed for the chosen simulation dates
- ROMSComponent.run() is now called with an "n_time_steps" arg, based on which it modifies the "__NTIMES_PLACEHOLDER__" string in the namelist at runtime
README.md Outdated
@@ -1,3 +1,4 @@
[![codecov](https://codecov.io/github/dafyddstephenson/C-Star/branch/automated_testing/graph/badge.svg?token=Z0L4U76WSG)](https://codecov.io/github/dafyddstephenson/C-Star)
Copy link

@NoraLoose NoraLoose Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change the path so it reflects the coverage of the CWorthy-ocean/C-Star main branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! I think it's done now but will have to check it's worked after the merge

@TomNicholas
Copy link
Member

Nice @dafyddstephenson ! Yes I would love to see some of these PRs broken up haha

add start_date, end_date, valid_start_date, and valid_end_date attributes

This is interesting - is it the first entry in the blueprint that has anything to do with whether a blueprint is "validated"? Because if so then CWorthy-ocean/C-Star#8 is relevant. We should think about whether time domain validity is defined for a whole blueprint, or per-component/dataset or what. I'll suggest that only an entire blueprint can be meaningfully validated, and so we should have a separate top-level section in the blueprint that contains everything relevant to validation status, including valid_start_date, and valid_end_date.

NOTE: we are here adding more metadata to the blueprint that should really be read from the dataset itself. It seems like @TomNicholas VirtualiZarr work could be relevant here?

Also interesting. I think that yes VirtualiZarr could be useful as a way to make it easier to access important information about an input dataset without having to open all the files in it. However while it's tempting to imagine putting cstar-relevant data in the metadata of a virtual zarr store pointing at input netCDF files, I'm not sure that would be the correct separation of concerns (because that additional data is not really part of the dataset, it's our judgement about the dataset).

@dafyddstephenson
Copy link
Contributor Author

Nice @dafyddstephenson ! Yes I would love to see some of these PRs broken up haha

😬 ! This was supposed to be at least two... I'll hold off on any further development until that repo cleaning issue is closed I think. sorry!

This is interesting - is it the first entry in the blueprint that has anything to do with whether a blueprint is "validated"?

It doesn't strictly relate to validation status, though I would say that these entries should be a requirement for validation. In this case for example forcing files are only defined for the year of 2012, so even though this blueprint isn't "validated" per se, it has a natural set of time boundaries that set the valid date range.

However while it's tempting to imagine putting cstar-relevant data in the metadata of a virtual zarr store pointing at input netCDF files, I'm not sure that would be the correct separation of concerns (because that additional data is not really part of the dataset, it's our judgement about the dataset).

Indeed, I wouldn't suggest we add C-Star-related metadata to input datasets, but I think things like just accessing the time bounds would mean we could set start_date and end_date for an input dataset by just reading the metadata, rather than manually typing them into the blueprint file or __init__ call.

Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, let's just separate out any important TODOs as issues then please merge!

…ction in place of copy-pasted code in AdditionalCode.get() and BaseModel.get()
@dafyddstephenson dafyddstephenson marked this pull request as draft August 8, 2024 18:08
“Dafydd added 2 commits August 8, 2024 13:18
…l_environmet() which is called by environment.py
…l_environmet() which is called by environment.py [typo fix]
@dafyddstephenson dafyddstephenson marked this pull request as ready for review August 8, 2024 19:28
@dafyddstephenson
Copy link
Contributor Author

Have now:

  • changed cstar_local_config.py to define a function set_local_environment() which is called by environment.py rather than it importing cstar_local_config as a whole to force execution
  • added a _clone_and_checkout utility to utils.py to avoid 3 instances of repeated code in additional_code.py and base_model.py

Have also added some more TODO comments that I'll turn into issues once this is merged (just so I can highlight the code in question on GitHub and create an issue from it directly). Think we're ready to merge.

@dafyddstephenson dafyddstephenson merged commit 6fcbc6f into CWorthy-ocean:main Aug 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants