Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Animal - Feature/multi grid occupancy #530

Merged
merged 38 commits into from
Dec 4, 2024

Conversation

TaranRallings
Copy link
Collaborator

@TaranRallings TaranRallings commented Jul 26, 2024

Description

EDIT - I have put this up in addition to a multi-grid + refactor draft PR at #538 . Before we do a deep review of the actual multi-grid content, I would appreciate folks taking a look at the refactor format. I am trying to decide whether to stick with this current version, built around the model, community, cohort, and territory classes, or whether to shift to the refactor that abandons the AnimalCommunity class and moves its content into AnimalModel and AnimalCohort.

This is code changes to the animal model that allow a cohort to simultaneously occupy more than one grid cell. This is done through an AnimalTerritory class that is attached to a given cohort and stores the information about the central grid of the territory, the full set of grids in the territory, and what plants and pools exist within it.

Changes build on the Madingley assumption of a cohort being evenly distributed within a grid cell. So a cohort is now evenly distributed within its territory. This creates some hiccups in some equations and I have made an issue to tackle this soon.

AnimalCommunity now stores animal cohort information in two ways.

  1. which cohorts have their territory centroid in the community (being both large and single grid territories)
  2. which cohorts have a partial presence in the territory and what proportion of the total cohort is there. (occupancy)

Running the AnimalModel cycles through communities, as before, but now each community cycles through those cohorts having their centroid in the community, not merely occupancy. Occupancy is used for foraging and excretion.

Basically any method that involved space, foraging, or excretion had to be changed to run off territories instead of a single community. I have been staring at it for too long to see errors now so I expect some problems snuck in.

I did some inelegant hacky stuff to make the circular import issue of community -> cohort -> territory -> community work. Please let me know if the current Frankenstein's monster of protocols and delayed imports is unworkable or if you have a cleaner solution.

Apologies for the size of this one!

Fixes # (issue)

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works
  • Relevant documentation reviewed and updated

…adds default classes for territory and community.
@codecov-commenter
Copy link

codecov-commenter commented Jul 26, 2024

Codecov Report

Attention: Patch coverage is 85.54688% with 37 lines in your changes missing coverage. Please review.

Project coverage is 94.42%. Comparing base (72ef3de) to head (b979702).

Files Patch % Lines
...tual_ecosystem/models/animal/animal_communities.py 75.00% 21 Missing ⚠️
virtual_ecosystem/models/animal/protocols.py 74.28% 9 Missing ⚠️
...rtual_ecosystem/models/animal/scaling_functions.py 66.66% 5 Missing ⚠️
virtual_ecosystem/models/animal/animal_cohorts.py 95.12% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #530      +/-   ##
===========================================
- Coverage    94.96%   94.42%   -0.54%     
===========================================
  Files           73       74       +1     
  Lines         4071     4287     +216     
===========================================
+ Hits          3866     4048     +182     
- Misses         205      239      +34     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@jacobcook1995 jacobcook1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really familiar enough with the animal code to give particularly helpful feedback on the overall structure. I did have a few questions about how excretion now works, also I think it would be good to have that documented somewhere

excreta_mass * self.constants.nitrogen_excreta_proportion
)
# the number of communities over which the feces are to be distributed
number_communities = len(excrement_pools)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right in thinking that a community == the set of species in a simulation grid cell?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's right. The community class handles the collective group of animals living, wholly or partially, within a grid cell.


excreta_mass_per_community = (
excreta_mass / number_communities
) * self.constants.nitrogen_excreta_proportion
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An this basically means that a cohort distributes its excrement evenly over the grid cells than comprise its territory?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. Current assumptions have waste deposited evenly across a cohort's territory.

Copy link
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does feel like the structure is rather dense and those internal imports suggest there is something up, but it's a complex PR, so I'm going to bank these comments before I loose them and think more deeply.

Comment on lines +132 to +138
AnimalTerritory = importlib.import_module(
"virtual_ecosystem.models.animal.animal_territories"
).AnimalTerritory

bfs_territory = importlib.import_module(
"virtual_ecosystem.models.animal.animal_territories"
).bfs_territory
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you not just use from virtual_ecosystem.models.animal.animal_territories import bfs_territory here? You can import within a function. Having said that - this doesn't seem optimal!

Comment on lines +166 to +184
def reinitialize_territory(
self,
cohort: AnimalCohort,
centroid_key: int,
get_community_by_key: Callable[[int], AnimalCommunity],
) -> None:
"""This initializes the territory occupied by the cohort.

TODO: update the territory size to cell number conversion using grid size

Args:
cohort: The animal cohort occupying the territory.
centroid_key: The community key anchoring the territory.
get_community_by_key: The method for accessing animal communities by key.
"""
# remove existing occupancies
cohort.territory.abandon_communities(cohort)
# reinitialize the territory
self.initialize_territory(cohort, centroid_key, get_community_by_key)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this would be easier as a single extra abandon argument to initialise_territory?

Comment on lines +88 to +94
@pytest.fixture
def mock_carcass_pool(self, mocker):
"""Fixture for a mock CarcassPool."""
mock_pool = mocker.Mock()
mock_pool.scavengeable_energy = 10000.0
mock_pool.decomposed_energy = 0.0
return mock_pool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why mock it? It's a small enough resource that it looks like you can simply use:

from virtual_ecosystem.models.animal.decay import CarcassPool
return CarcassPool(scavengeable_energy=10000.0, decomposed_energy=0)

Comment on lines +108 to +141
@pytest.fixture
def animal_territory_instance_1(self, mock_get_community_by_key):
"""Fixture for the first animal territory with mock get_community_by_key."""
from virtual_ecosystem.models.animal.animal_territories import AnimalTerritory

return AnimalTerritory(
centroid=0,
grid_cell_keys=[1, 2, 3],
get_community_by_key=mock_get_community_by_key,
)

@pytest.fixture
def animal_territory_instance_2(self, mock_get_community_by_key):
"""Fixture for the second animal territory with mock get_community_by_key."""
from virtual_ecosystem.models.animal.animal_territories import AnimalTerritory

return AnimalTerritory(
centroid=1,
grid_cell_keys=[2, 3, 4],
get_community_by_key=mock_get_community_by_key,
)

def test_find_intersecting_carcass_pools(
self,
animal_territory_instance_1,
animal_territory_instance_2,
mock_carcass_pool,
):
"""Test for find_intersecting_carcass_pools method."""
intersecting_pools = (
animal_territory_instance_1.find_intersecting_carcass_pools(
animal_territory_instance_2
)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.fixture
def animal_territory_instance_1(self, mock_get_community_by_key):
"""Fixture for the first animal territory with mock get_community_by_key."""
from virtual_ecosystem.models.animal.animal_territories import AnimalTerritory
return AnimalTerritory(
centroid=0,
grid_cell_keys=[1, 2, 3],
get_community_by_key=mock_get_community_by_key,
)
@pytest.fixture
def animal_territory_instance_2(self, mock_get_community_by_key):
"""Fixture for the second animal territory with mock get_community_by_key."""
from virtual_ecosystem.models.animal.animal_territories import AnimalTerritory
return AnimalTerritory(
centroid=1,
grid_cell_keys=[2, 3, 4],
get_community_by_key=mock_get_community_by_key,
)
def test_find_intersecting_carcass_pools(
self,
animal_territory_instance_1,
animal_territory_instance_2,
mock_carcass_pool,
):
"""Test for find_intersecting_carcass_pools method."""
intersecting_pools = (
animal_territory_instance_1.find_intersecting_carcass_pools(
animal_territory_instance_2
)
)
@pytest.fixture
def animal_territory_instances(self, mock_get_community_by_key):
"""Fixture for the first animal territory with mock get_community_by_key."""
from virtual_ecosystem.models.animal.animal_territories import AnimalTerritory
return (
AnimalTerritory(
centroid=0,
grid_cell_keys=[1, 2, 3],
get_community_by_key=mock_get_community_by_key,
),
AnimalTerritory(
centroid=1,
grid_cell_keys=[2, 3, 4],
get_community_by_key=mock_get_community_by_key,
)
)
def test_find_intersecting_carcass_pools(
self,
animal_territory_instances,
mock_carcass_pool,
):
"""Test for find_intersecting_carcass_pools method."""
t1, t2 = animal_territory_instances
intersecting_pools = t1.find_intersecting_carcass_pools(t2)

It might even be making the fixture in the conftest.py provide these two and just reuse it. You can always ditch one when it isn't needed: t1, _ = animal_community_instances

@dalonsoa
Copy link
Collaborator

I was about to review this, but found out about #538 about a similar topic. Which should a review first? Or should I wait?

@TaranRallings
Copy link
Collaborator Author

I was about to review this, but found out about #538 about a similar topic. Which should a review first? Or should I wait?

Just added an edit to this to clarify!

Before I get a proper review on the multi-grid functionality, I need some help deciding which format would be better going forward. If you could give me some comparative structural feedback on the #538 first, that would be great.

@davidorme
Copy link
Collaborator

@dalonsoa I had a long chat with @TaranRallings on Teams about this implementation before I went on leave and I think that context is missing here, so I'm going to duplicate my stream of consciousness here to capture the rationale for the parallel multi grid PRs:

The heart of the issue is that we have a stack of cohorts (which can now live in multiple cells) and we have to go through them all in some order and find out:

  • where they shit (into the ExcrementPool of each cell in their territory)
  • what primary productivity they eat (from the PlantResources of each cell in their territory)
  • where they die (into the CarcassPool of each cell in their territory) and
  • what prey animals they intersect with and (can therefore kill and strew bodies around the CarcassPools of their intersecting territories).

Does the order of resolution of the communities [ETA - ie. the specific sets of cohorts found in one cell] actually matter? We want to run each cohort, but is it actually desirable to run cohorts in groups by community [ETA - ie cell], or would it be equally good to just run all cohorts randomly regardless of community? Of course, we could define a cohort ordering that reflects their shared co-occurrence?

I think where I'm going with this is that the Community was a natural thing in the single grid occupancy model. The cohorts only occur in a single cell and can only eat things that also occur in that cell, die in that cell and crap in that cell. So - group them into a community.

But we've now got something fundamentally different with multi-grid occupancy and the Community feels a bit like a strait-jacket. The resource and decay pools are all nested inside them and we're pinning cohorts to a more or less arbitrary 'home' Community.

Is an alternative is to have lists of the pools that are 1-to-1 with cells (Excrement, Carcass, Plants) [ETA - as properties of the AnimalModel] and then just have a list of animal cohorts with territories - the animal cohorts are the only thing that can span cells. We'd end up with something a little like this very dumb model:

import random
from dataclasses import dataclass, field
from uuid import uuid4, UUID

import numpy as np


@dataclass
class Cohort:
    size: float
    herbivore: bool
    territory: set[int]
    n: int
    uuid: UUID = field(init=False)

    def __post_init__(self):
        self.uuid = uuid4()


@dataclass
class Plants:
    leaves: int = 10


@dataclass
class Corpses:
    n_corpses: int = 0


@dataclass
class Crap:
    amount: int = 0


class Dumb:
    def __init__(self, n_cells=9):
        self.cell_ids = np.arange(n_cells)
        self.plants = [Plants() for idx in self.cell_ids]
        self.crap = [Crap() for idx in self.cell_ids]
        self.corpses = [Corpses() for idx in self.cell_ids]

        self.cohorts = [
            Cohort(size=10, herbivore=False, territory={0, 1, 2, 3, 4, 5}, n=2),
            Cohort(size=5, herbivore=False, territory={3, 4, 5, 6, 7}, n=20),
            Cohort(size=1, herbivore=True, territory={1, 2, 3, 4, 5, 6, 7}, n=200),
        ]

    def step(self):
        # Loop over cohorts
        for focal_cohort in self.cohorts:
            # Do herbivory
            if focal_cohort.herbivore:
                for cell in focal_cohort.territory:
                    self.plants[cell].leaves -= 1

            # Do carnivory - each individual eats 1 of each smaller cohort
            if not focal_cohort.herbivore:
                for target_cohort in self.cohorts:
                    if (focal_cohort.uuid != target_cohort.uuid) and (
                        focal_cohort.size > target_cohort.size
                    ):
                        # Do these cohorts overlap in space?
                        overlap = focal_cohort.territory.intersection(
                            target_cohort.territory
                        )

                        if overlap:
                            target_cohort.n -= focal_cohort.n
                            for corpses in np.arange(focal_cohort.n):
                                corpse_cell = random.sample(list(overlap), 1)[0]
                                self.corpses[corpse_cell].n_corpses += 1

            # Do crapping
            total_crap_per_cell = (focal_cohort.size * 0.1 * focal_cohort.n) / len(
                focal_cohort.territory
            )
            for cell_id in focal_cohort.territory:
                self.crap[cell_id].amount += total_crap_per_cell


dumb = Dumb()
dumb.step()
dumb.step()
dumb.step()
dumb.step()

print(dumb.corpses)
print(dumb.plants)
print(dumb.crap)
print(dumb.cohorts)

Those classes seem less encumbered and circular - but there may be something critical we lose here.

I'm going to have to avoid thinking about the cohort overlap question. Too many interesting avenues. I did just want to dump another approach though. The set intersection solution above probably isn't too slow, but it is going to be used a simply huge number of times. The problem is that the number of possible intersection checks scales as the square of the number of cohorts, which isn't ideal.

I'm vaguely wondering if it might be worth retaining the communities [ETA - ie the community structure] as per cell id lists of cohort IDs? That might be faster for getting to overlapping cohorts - although there is then the cost of maintaining that index as cohorts move. That might be worth it though.

import random
from dataclasses import dataclass, field
from uuid import uuid4, UUID

import numpy as np


@dataclass
class Cohort:
    size: float
    herbivore: bool
    territory: set[int]
    n: int
    uuid: UUID = field(init=False)

    def __post_init__(self):
        self.uuid = uuid4()


@dataclass
class Plants:
    leaves: int = 10


@dataclass
class Corpses:
    n_corpses: int = 0


@dataclass
class Crap:
    amount: int = 0


class Dumb:
    def __init__(self, n_cells=9):
        self.cell_ids = np.arange(n_cells)
        self.plants = [Plants() for idx in self.cell_ids]
        self.crap = [Crap() for idx in self.cell_ids]
        self.corpses = [Corpses() for idx in self.cell_ids]

        self.cohorts: dict[UUID, Cohort] = {}

        self.communities = [set() for idx in self.cell_ids]

    def add_cohort(self, cohort: Cohort) -> None:
        # add the cohort and register the cohort in the communities for its territory
        self.cohorts[cohort.uuid] = cohort
        for cell_id in cohort.territory:
            self.communities[cell_id].add(cohort.uuid)

    def step(self):
        # Loop over cohorts
        for f_uuid, focal_cohort in self.cohorts.items():
            # Do herbivory
            if focal_cohort.herbivore:
                for cell in focal_cohort.territory:
                    self.plants[cell].leaves -= 1

            # Do carnivory - each individual eats 1 of each smaller cohort
            if not focal_cohort.herbivore:
                # Find communities overlapping focal cohort
                cooccur = [self.communities[idx] for idx in focal_cohort.territory]
                target_uuids = set([uuid for cell in cooccur for uuid in cell])

                for t_uuid in target_uuids:
                    target_cohort = self.cohorts[t_uuid]
                    if (focal_cohort.uuid != target_cohort.uuid) and (
                        focal_cohort.size > target_cohort.size
                    ):
                        # Get the overlap - it is still probably faster to do this to
                        # get the intersection now that they are known to overlap?
                        overlap = focal_cohort.territory.intersection(
                            target_cohort.territory
                        )

                        target_cohort.n -= focal_cohort.n
                        for corpses in np.arange(focal_cohort.n):
                            corpse_cell = random.sample(list(overlap), 1)[0]
                            self.corpses[corpse_cell].n_corpses += 1

            # Do crapping
            total_crap_per_cell = (focal_cohort.size * 0.1 * focal_cohort.n) / len(
                focal_cohort.territory
            )
            for cell_id in focal_cohort.territory:
                self.crap[cell_id].amount += total_crap_per_cell


dumb = Dumb()


dumb.add_cohort(Cohort(size=10, herbivore=False, territory={0, 1, 2, 3, 4, 5}, n=2))
dumb.add_cohort(Cohort(size=5, herbivore=False, territory={3, 4, 5, 6, 7}, n=20))
dumb.add_cohort(Cohort(size=1, herbivore=True, territory={1, 2, 3, 4, 5, 6, 7}, n=200))

dumb.step()
dumb.step()
dumb.step()
dumb.step()

print(dumb.corpses)
print(dumb.plants)
print(dumb.crap)
print(dumb.cohorts)

I think the technical aspect of how best to do that overlap is one for the RSE team 😄

@davidorme
Copy link
Collaborator

davidorme commented Aug 28, 2024

@dalonsoa If we decouple AnimalCohorts from specific cells (which we need to do to allow large animals to occupy many cells) then we end up needing to coordinate indexing from $N$ grid cell ids to $M$ animal cohorts. Although $N$ is smallish and fixed, $M$ varies (as cohorts are born and die) and could potentially get quite large.

Finding the cell specific pools for plants, carcasses and excrement is just indexing across cell ids from the cohort territory (set of cell ids where the cohort exists) but for carnivory we want to be able to quickly extract which cohorts overlap each other. The ranges may be sufficiently dynamic that it this is unavoidably a per step operation, but we'd like to optimise how best to find that. Although an $N \times M$ matrix (probably sparse because many cohorts will have small territories) looks attractive, it will vary in shape very frequently and might be rather large, but maybe if that gets created once per time step then it can form the basis of the indexing for that time step? Unless of course ranges change during the process rather than at the end! Not sure what is optimal here.

@dalonsoa
Copy link
Collaborator

dalonsoa commented Aug 28, 2024

It's an interesting problem :)

I think the second approach is neater, but I'm a bit worried on the complication of keeping the communities list updated when there are tons of cohorts moving around a large number of cells. Not being for that, I would choose that one.

The first approach, on the other hand, is simpler and possibly slower if things don't move, but you can see in the code that it is not scaling as the square of the number of cohorts: you have a first filter based on the cohort size, for example, so the intersection will need to be checked way less than the square of cohorts.

I guess that in a real scenario the filters to decide what other things get eaten by a cohort will be different, but if they are applied in the right order, you might cut the search space pretty fast, only applying the heaviest filters to a handful of cohort pairs. Combined with efficient tools like list comprehensions, itertools, functools, etc., I don't think this should be a barrier.

So, something like the following, for this dummy case, could be pretty efficient:

from functools import reduce

def is_different(one, two):
    return one.uuid != two.uuid

def is_bigger(one, two):
    return one.size > two.size

def overlap(one, two):
    return len(focal_cohort.territory.intersection(target_cohort.territory)) != 0

def apply_filter(focal, cohorts, filter):
    return [target for target in cohorts if filter(focal, target)]

filters = [is_different, is_bigger, overlap]

for focus in all_cohorts:
    food = reduce(lambda cohorts, filter: apply_filter(focal, cohorts, filter), filters, all_cohorts)
    crap, corpses = focus.eat(food)

Or something along these lines. I'm sure you get the idea.

@dalonsoa
Copy link
Collaborator

Some filters like if a cohort if within the "menu" of the focus cohort (eg. "insects" for a cohort of a species of birds), will be really efficient to put a the beginning of the list and cutting it down to a minimum the range of options.

Also, in the example above, probably we can skip the is_different filter because the is_bigger filter is already getting rid of the focus cohort itself from the list.

Obviously, applying filters this way can be applied equally to the second example where you keep track of the territory overlap, but I don't think it is worth the effort to keep that filter - because that is all it is, in practice - as a special case, unless that information needs to be kept for other reasons.

@davidorme
Copy link
Collaborator

davidorme commented Aug 29, 2024

I think that filter approach looks really promising. I could see the functional types being defined with a set of named prey filters and then having a simple registry of filter functions that users could expand to add new prey selection modes. We'd likely need some and/or logic that we could apply to filters or possibly allow sets of filters, but I can see that process working well.

The overlap filter is a special case because I think we would always want that to return the contents of focal_cohort.territory.intersection(target_cohort.territory). But equally, we are only ever going to want to select prey that overlaps, so that might be a separate step after the previous filtering. @TaranRallings ?

@TaranRallings
Copy link
Collaborator Author

There is quite a bit going on here so let me know if I have misunderstood something.

  • we think ditching the community class is a good idea
  • we are worried about indexing issues (there is a reason foraging ate up all the time in Madingley)
  • @dalonsoa 's filtering set up is probably a good flexible structure for handling prey selection going forward

Unordered Notes:

  • foraging cannot be collapsed into 1 prey item at a time without reworking the foraging math (delta_mass_predation) as it is built on cumulative prey densities.
  • carcasses are handled at the intersection of territories because that is where prey fall happens
  • a full system community matrix would need to be regenerated at multiple points within a timestep as it changes due to: local migration, immigration/emigration, birth, death, and metamorphosis all of which are handled separately.
  • prey selection is a toy version of body-size relationships and I won't be adding significant detail to it until later in development.
  • set (and other oddities) are in Animal - Feature/multi grid no community #538 because I was trying to get something basically comprehensible out so we could make a decision asap and I wouldn't lose too much time on the refactor. In the case of set, I wanted it for intersection() and then decided to roll with it for related structures.

If we agree on the no-community version then I will move forward in development there.

@davidorme
Copy link
Collaborator

@TaranRallings That's all clear and I think that "no community object" implementation in #538 is the one to pursue (unless @dalonsoa spots something!).

I hadn't spotted that cumulative density issue in delta_mass_predation. At the moment we have a list of target cohorts in delta_mass_predation and then - for each one - we dive down through calculate_consumed_mass_predation, F_i_j_individual and then theta_i_j(animal_list).

So that theta_i_j is calculated repeatedly, but is that mandatory because predation of subsequent cohorts needs to account for the reduced density of already predated targets? I was wondering if we could calculate theta_i_j once across the list of targets, but I guess that is then implicitly a model where predation is parallel rather than sequential.

@TaranRallings
Copy link
Collaborator Author

@davidorme ok great. I need to charge ahead on the scheduled work so I'll get to that. I'll make an issue to take a look at other indexing possibilities after the animal model is fully up and running.

@dalonsoa
Copy link
Collaborator

@TaranRallings , let me know when you want me to review this. As I'm still tagged from last time, I'm not sure if this is ready or not.

@TaranRallings TaranRallings merged commit b979702 into develop Dec 4, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants