Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Main timing loop #121

Merged
merged 20 commits into from
Dec 2, 2022
Merged

Main timing loop #121

merged 20 commits into from
Dec 2, 2022

Conversation

jacobcook1995
Copy link
Collaborator

Description

I've added a timing loop to the main function which runs the Virtual Rainforest model. This loop controls which models should be updated on a given time step. I've also added a few functions which check that timing options work sensibly together, and warn the user when they probably don't. To facilitate all this the time check function defined in BaseModel was modified, and a new function was added which starts the timing.

One thing it would be useful to get feedback on is whether the way I divide models between to_refresh and fixed in get_models_to_update is efficient. Because at some point these are presumably going to be fairly large objects, so if I'm copying them in an inefficient manner the performance overhead might be quite large.

Anyway, let me know your thoughts.

Fixes #103

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

Copy link
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is expensively set up at the moment. All we really need to do in this loop is to tick forward rapidly and cheaply check whether the any of the models are now out of date. I've sketched an idea of what I'm thinking about in the comments.

Comment on lines 253 to 261
while current_time < end_time:

current_time += update_interval

to_refresh, fixed = get_models_to_update(current_time, models_cfd)

# TODO - Solve models to steady state

models_cfd = to_refresh + fixed
Copy link
Collaborator

@davidorme davidorme Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced by this - there is a lot of model passing around for what need to be some frequent checks. And I'm also not convinced by the separate time 'grain' for the models and update_interval.

  • I think update_interval should be small (let's say 1 minute). There is a cost to repeating this loop at a fast frequency but all it should be doing really is keeping track of changing time.
  • The whole get_models_to_update is expensive. I think we should track just a set of next_updatetimes. Something like this:
# Get the list of date times of the next update.
update_due = {mod.name: mod.next_update for mod in models_cfd}

# Loop until the end
while current_time < end_time:
    # Get the names of models that have expired due dates
    update_needed = [nm for nm, upd in update_due.items() if upd <= current_time ]

    # If there are expired model, run their update() method and update their update due dates
    if update_needed:
        for mod_nm in update_needed:
            models_cfg[mod_nm].update()
            update_due[mod_nm] = models_cfg[mod_nm].next_update()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a huge issue in here with models running at different rates and how to synchronize them. Ultimately that would be great to actually do but we need to kick this massively into touch for now and enforce synchronised model updates.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agree that using list comprehensions here seems way less costly and is probably the way to go. Not sure I agree that the update interval should go down to 1 minute though as even though it's only tracking time, there's a few steps (increment time, list comp, if statement) per minute tick to do so. Don't think using a really small time step in the main loop is relevant, unless we are expecting models to communicate at that time scale, but in that case I think it's questionable whether they should actually be separate models

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, but I suspect that cost will be trivial compared to the update() costs! I've just tried running the loop (including list comprehension due date check) using 1 hour intervals and a century took about 10 seconds, so minutes would probably be a bit much (guessing at 10 minutes).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [169]: print(end - start)
0:10:31.752243

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like very much David's alternative. It should definitely be faster than copying models around. It can also be simplified slightly:

# Get the list of date times of the next update.
update_due = {mod.name: mod.next_update for mod in models_cfd}

# Loop until the end
while current_time < end_time:
    # Get the names of models that have expired due dates
    update_needed = [nm for nm, upd in update_due.items() if upd <= current_time ]

    # If there are expired model, run their update() method and update their update due dates
    for mod_nm in update_needed:
        models_cfg[mod_nm].update()
        update_due[mod_nm] = models_cfg[mod_nm].next_update()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, yes. Entirely redundant if statement 😞

Copy link
Collaborator

@dalonsoa dalonsoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor comments. David's done most of the hard work.

Comment on lines 253 to 261
while current_time < end_time:

current_time += update_interval

to_refresh, fixed = get_models_to_update(current_time, models_cfd)

# TODO - Solve models to steady state

models_cfd = to_refresh + fixed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like very much David's alternative. It should definitely be faster than copying models around. It can also be simplified slightly:

# Get the list of date times of the next update.
update_due = {mod.name: mod.next_update for mod in models_cfd}

# Loop until the end
while current_time < end_time:
    # Get the names of models that have expired due dates
    update_needed = [nm for nm, upd in update_due.items() if upd <= current_time ]

    # If there are expired model, run their update() method and update their update due dates
    for mod_nm in update_needed:
        models_cfg[mod_nm].update()
        update_due[mod_nm] = models_cfg[mod_nm].next_update()

Copy link
Collaborator

@dalonsoa dalonsoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I've just have a very minor question.

# This is just a step to pass flake8 checks (DELETE LATER)
print(models_cfd)
# Define the basic model time grain (set to 10 minutes for now)
update_interval = timedelta64(10, "m")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something you want to hardcode rather than reading it from the configuration?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah my worry was with users setting an update interval that wasn't small enough to capture the timing of models well, i.e. setting it to be 3 weeks when one model updates every three weeks while another updates monthly would generate undesirable behaviour.

Can definitely see the value in this being configurable though, I'll probably add it to the config schema with the default set to 10 minutes, and the description for the user warning them to set it significantly lower than the model time step

@codecov-commenter
Copy link

codecov-commenter commented Dec 2, 2022

Codecov Report

Merging #121 (8544bc8) into develop (89407ed) will decrease coverage by 1.66%.
The diff coverage is 79.59%.

@@             Coverage Diff             @@
##           develop     #121      +/-   ##
===========================================
- Coverage    95.88%   94.21%   -1.67%     
===========================================
  Files           11       11              
  Lines          437      467      +30     
===========================================
+ Hits           419      440      +21     
- Misses          18       27       +9     
Impacted Files Coverage Δ
virtual_rainforest/main.py 81.53% <72.22%> (-9.09%) ⬇️
virtual_rainforest/core/model.py 100.00% <100.00%> (ø)
virtual_rainforest/soil/model.py 94.73% <100.00%> (+0.29%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - the only ting I'd have picked up is the hardcoded step interval and that's fixed now!

@jacobcook1995 jacobcook1995 merged commit 157157e into develop Dec 2, 2022
@jacobcook1995 jacobcook1995 deleted the feature/basic_timing_loop branch December 2, 2022 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timing of overall simulation
4 participants