Add CI and ability to control simulation dates #35

dafyddstephenson · 2024-08-01T19:34:43Z

Sorry for another big PR, I was hoping to just work on this in the background thinking the repo cleanup (issue CWorthy-ocean/C-Star#2) would be done a lot more quickly, but I think it's now probably more appropriate to get this merged first.

Summary of changes

1. Introduction of CI:

A GitHub runner now configures the environment, installs C-Star, creates aroms_marbl_example case from a blueprint, and calls setup, build , pre_run , run, and post_run on it. The case is run for 30 model minutes.

2. Add ability to control simulation dates

start_date and end_date attributes added to Case and InputDataset objects, the latter to determine whether, e.g. a specific forcing file is needed and proceed accordingly. valid_start_date and valid_end_date attributes are also defined on Case objects, to specify the valid date range supported and check the user's provided start and end dates are compatible.time_step attribute added to ROMSComponent objects.

Full description of changes:

1. Introduction of CI:

Changes:
New files

.github/workflows/tests.yaml which runs...
cstar_ocean/tests/test_roms_marbl_example.py which runs the roms_marbl_example blueprint for 30 model minutes
ci/environment.yml to manage conda env on github runner
coverage.toml file for code coverage

2. Add ability to control simulation dates

Modifications:

cstar_case.py:
- add start_date, end_date, valid_start_date, and valid_end_date attributes to Case class. These can be either datetime objects or strings, but strings are parsed by dateutil.parser to create datetime objects. The former belong to the specific Case object created by the user and are not written to or read from a blueprint yaml file, whereas the latter belong to the blueprint. If no range of valid dates is provided when creating a Case object, the user is warned. If no start and end dates are provided, the default behaviour is to set them as the earliest and latest valid dates, respectively. If these are also not provided, an error is raised.
- Accordingly, add checks whether the user's provided start_date and end_date fit within the valid date range
- Add start_date and end_date args to Case.from_blueprint() method signature as these are not present in the blueprint.
- Add block to Case.from_blueprint() parsing the valid date range from the yaml file.
- Add start_date and end_date entries to input_dataset fields in the YAML file: as we only want to fetch netcdf files whose own date range overlaps the Case's date range (e.g. we only want the january surface forcing file if we're only running for the first week of the year), these attributes help determine that. NOTE: we are here adding more metadata to the blueprint that should really be read from the dataset itself. It seems like @TomNicholas VirtualiZarr work could be relevant here?
- (tidy up redundant code related to previous change disallowing Components to use lists of AdditionalCode objects, L600)
- Extend Case.check_is_setup() method to only check for the local presence of input datasets whose date range overlaps the Case's date range
- Extend Case.setup() method to only call InputDataset.get(self.caseroot) if an input dataset's date range overlaps the Case's date range
- Add some logic to Case.run() to provide the n_time_steps argument to ROMSComponent.run() based on the time step and date range.
component.py :
- add time_step attribute to ROMSComponent subclass
- add conditional logic on ROMSComponent.pre_run() to only partition input datasets that exist locally (based on new behaviour in input_dataset.py where input datasets are only fetched locally if they overlap the user's chosen simulation dates)
- Change use of InputDataset.local_path in ROMSComponent.pre_run() to reflect changes in input_dataset where the local path attribute now provides the full path to the file, not just its parent directory
- Add steps to ROMSComponent.pre_run() to modify the ROMS runtime namelist. As ROMS opens all input datasets specified in the namelist on init, we need to dynamically modify the namelist to only describe datasets for this case that exist locally (i.e. ones which overlap the simulation period). This is done by including a "template" namelist with placeholder strings, e.g. __INITIAL_CONDITION_FILE_PLACEHOLDER__ that are replaced during pre_run() with the correct path to the IC file by C-Star.
- add n_time_steps argument to ROMSComponent.run(). The argument is optional, and defaults to 1, but the user will be warned if this behaviour is invoked.
- Add a step to ROMSComponent.run() to modify the namelist, adding the number of timesteps by replacing the placeholder string __NTIMES_PLACEHOLDER__
input_dataset.py:
- Add start_date and end_date attributes to InputDataset class (describing for example the earliest and latest entries in a forcing file). Can be datetime or str, but strings are parsed into datetimes by dateutil.parser
- Change InputDataset.local_path attribute to reflect full path to dataset file, not just parent directory, consistently with other path-related attributes elsewhere in C-Star

… roms_marbl example case

include coverage badge

- Case now has a "start_date" and "end_date" attribute specified at initialisation, even when reading from a blueprint file (the user's desired run dates are not relevant to the blueprint) - Case now has "valid_start_date" and "valid_end_date" attributes which /are/ defined as part of the blueprint: this is the full range of dates supported by the blueprint - When instantiating a case, the user's start and end dates are checked against the valid start and end dates - InputDataset now has an optional "start_date" and "end_date" attribute/yaml entry to avoid downloading unnecessary files (e.g. forcing files outside the user's desired date range) during Case.setup Still to do: - Modify the runtime namelists using these start and end dates on call to .run() so the simulation actually behaves consistently with the Case it is represented by Other: - Bugfix where ruff was removing an "unused" import of cstar_local_config in environment.py: have added a noqa comment to prevent this for now

- Blueprint discretization entry now has a "time_step" field from which the number of time steps to run with is calculated - ROMSComponent.pre_run() now modifies a template namelist, replacing a series of placeholder strings with paths only to any files needed for the chosen simulation dates - ROMSComponent.run() is now called with an "n_time_steps" arg, based on which it modifies the "__NTIMES_PLACEHOLDER__" string in the namelist at runtime

…d Case

NoraLoose · 2024-08-01T19:58:21Z

README.md

@@ -1,3 +1,4 @@
+[![codecov](https://codecov.io/github/dafyddstephenson/C-Star/branch/automated_testing/graph/badge.svg?token=Z0L4U76WSG)](https://codecov.io/github/dafyddstephenson/C-Star)


Should we change the path so it reflects the coverage of the CWorthy-ocean/C-Star main branch?

good catch! I think it's done now but will have to check it's worked after the merge

TomNicholas · 2024-08-01T20:43:41Z

Nice @dafyddstephenson ! Yes I would love to see some of these PRs broken up haha

add start_date, end_date, valid_start_date, and valid_end_date attributes

This is interesting - is it the first entry in the blueprint that has anything to do with whether a blueprint is "validated"? Because if so then CWorthy-ocean/C-Star#8 is relevant. We should think about whether time domain validity is defined for a whole blueprint, or per-component/dataset or what. I'll suggest that only an entire blueprint can be meaningfully validated, and so we should have a separate top-level section in the blueprint that contains everything relevant to validation status, including valid_start_date, and valid_end_date.

NOTE: we are here adding more metadata to the blueprint that should really be read from the dataset itself. It seems like @TomNicholas VirtualiZarr work could be relevant here?

Also interesting. I think that yes VirtualiZarr could be useful as a way to make it easier to access important information about an input dataset without having to open all the files in it. However while it's tempting to imagine putting cstar-relevant data in the metadata of a virtual zarr store pointing at input netCDF files, I'm not sure that would be the correct separation of concerns (because that additional data is not really part of the dataset, it's our judgement about the dataset).

dafyddstephenson · 2024-08-01T20:53:59Z

Nice @dafyddstephenson ! Yes I would love to see some of these PRs broken up haha

😬 ! This was supposed to be at least two... I'll hold off on any further development until that repo cleaning issue is closed I think. sorry!

This is interesting - is it the first entry in the blueprint that has anything to do with whether a blueprint is "validated"?

It doesn't strictly relate to validation status, though I would say that these entries should be a requirement for validation. In this case for example forcing files are only defined for the year of 2012, so even though this blueprint isn't "validated" per se, it has a natural set of time boundaries that set the valid date range.

However while it's tempting to imagine putting cstar-relevant data in the metadata of a virtual zarr store pointing at input netCDF files, I'm not sure that would be the correct separation of concerns (because that additional data is not really part of the dataset, it's our judgement about the dataset).

Indeed, I wouldn't suggest we add C-Star-related metadata to input datasets, but I think things like just accessing the time bounds would mean we could set start_date and end_date for an input dataset by just reading the metadata, rather than manually typing them into the blueprint file or __init__ call.

TomNicholas

Looks good, let's just separate out any important TODOs as issues then please merge!

…ction in place of copy-pasted code in AdditionalCode.get() and BaseModel.get()

…l_environmet() which is called by environment.py

…l_environmet() which is called by environment.py [typo fix]

dafyddstephenson · 2024-08-08T19:31:19Z

Have now:

changed cstar_local_config.py to define a function set_local_environment() which is called by environment.py rather than it importing cstar_local_config as a whole to force execution
added a _clone_and_checkout utility to utils.py to avoid 3 instances of repeated code in additional_code.py and base_model.py

Have also added some more TODO comments that I'll turn into issues once this is merged (just so I can highlight the code in question on GitHub and create an issue from it directly). Think we're ready to merge.

“Dafydd and others added 20 commits July 24, 2024 18:09

first commit on automated_testing branch: sets up c-star and runs the…

cf16a0a

… roms_marbl example case

add coverage to environment.yml

98e7fcd

add coverage.toml file

1fd2019

move coverage.toml to top level dir

35803f8

add printenv to tests.yaml

c7a2889

modify environment.py so env is configured on github runner

fb8b226

fix path to blueprint in test

56a53f3

fix path to blueprint in test ii

ba58325

add case for linux_x86_64 for exec_pfx in component.py

a205a78

Update README.md

32a1d2b

include coverage badge

Update README.md

d808c57

add comments to environment.py with hints for future development

c18b93e

add dateutil to CI environment

db744e7

changes after pre-commit checks

73d12b2

adjust run legth in test script to 30 mins from 1 day

821052e

tidying after reviewing during PR

326509c

update docstring for ROMSComponent

67f8550

update __str__ functions to reflect new attributes in InputDataset an…

0268749

…d Case

dafyddstephenson mentioned this pull request Aug 1, 2024

Clean up repository & move python package to top level CWorthy-ocean/C-Star#2

Closed

NoraLoose reviewed Aug 1, 2024

View reviewed changes

update codecov badge url in README

27eae81

TomNicholas approved these changes Aug 7, 2024

View reviewed changes

add utility function for cloning/checking out git repos, use this fun…

a191c93

…ction in place of copy-pasted code in AdditionalCode.get() and BaseModel.get()

dafyddstephenson marked this pull request as draft August 8, 2024 18:08

“Dafydd added 2 commits August 8, 2024 12:41

update _clone_and_checkout to print subprocess stderr on failure

ff1be54

add print statement to end of test routine and TODO in base_model

975db23

“Dafydd added 2 commits August 8, 2024 13:18

convert generated cstar_local_config.py to define a function set_loca…

d5bafeb

…l_environmet() which is called by environment.py

convert generated cstar_local_config.py to define a function set_loca…

5f7e5fd

…l_environmet() which is called by environment.py [typo fix]

dafyddstephenson marked this pull request as ready for review August 8, 2024 19:28

update roms_marbl_blueprint commit hash to reflect ucla-roms PR30

f2c402e

dafyddstephenson merged commit 6fcbc6f into CWorthy-ocean:main Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CI and ability to control simulation dates #35

Add CI and ability to control simulation dates #35

dafyddstephenson commented Aug 1, 2024

NoraLoose Aug 1, 2024 •

edited

Loading

dafyddstephenson Aug 1, 2024

TomNicholas commented Aug 1, 2024

dafyddstephenson commented Aug 1, 2024

TomNicholas left a comment

dafyddstephenson commented Aug 8, 2024

		@@ -1,3 +1,4 @@
		[![codecov](https://codecov.io/github/dafyddstephenson/C-Star/branch/automated_testing/graph/badge.svg?token=Z0L4U76WSG)](https://codecov.io/github/dafyddstephenson/C-Star)

Add CI and ability to control simulation dates #35

Add CI and ability to control simulation dates #35

Conversation

dafyddstephenson commented Aug 1, 2024

Summary of changes

1. Introduction of CI:

2. Add ability to control simulation dates

Full description of changes:

1. Introduction of CI:

2. Add ability to control simulation dates

NoraLoose Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

dafyddstephenson Aug 1, 2024

Choose a reason for hiding this comment

TomNicholas commented Aug 1, 2024

dafyddstephenson commented Aug 1, 2024

TomNicholas left a comment

Choose a reason for hiding this comment

dafyddstephenson commented Aug 8, 2024

NoraLoose Aug 1, 2024 •

edited

Loading