Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug where race condition results in incorrect fields categorization when computing particle_trajectories #4802

Merged
merged 1 commit into from
Feb 8, 2024

Conversation

mtryan83
Copy link
Contributor

@mtryan83 mtryan83 commented Feb 2, 2024

In the _get_data method of particle_trajectories, self.data_series[0] might still be unloaded (not sure if that's the right word?) when running in parallel. That was causing the if statement that checks if the particle_type of missing field exists to fail, incorrectly adding fields to the grid_fields list. Since ds_first is loaded (because we called all_data() on it earlier), this is now more likely to succeed.

Essentially

ds_first = self.data_series[0]
dd_first = ds_first.all_data()
# ds_first is now loaded, with fields generated correctly

fds = {}
new_particle_fields = []
for field in missing_fields:
    fds[field] = dd_first._determine_fields(field)[0]
    if field not in self.particle_fields:
        ftype = fds[field][0]
        #### race condition here:
        # if ftype in self.data_series[0].particle_types:
        # even though ds_first is loaded, this may not have propagated back 
        # to self.data_series[0]. 
        # So self.data_series[0].particle_types != ds_first.particle_types
        #### fix:
        if ftype in ds_first.particle_types:
            self.particle_fields.append(field)
            new_particle_fields.append(field)

In my case, self.data_series[0].particle_types didn't even have the all particle_type, even though ds_first had all and several others. Let me know if you need more information.

PR Summary

PR Checklist

  • New features are documented, with docstrings and narrative docs
  • Adds a test for any bugs fixed. Adds tests for new features.

Unfortunately, I don't know why this wasn't picked up in the tests, I would expect it to be picked up by
test_default_field_tuple.

…ded(?). ds_first is not (since we call all_data() on it). That was causing this if statement to fail.
@mtryan83
Copy link
Contributor Author

mtryan83 commented Feb 2, 2024

The failing tests appear to be unrelated.

Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your fix is absolutely reasonable but I don't really understand in what situation this race condition could happen ? I don't know that we enable any parallelism in tests, which may be why we never caught this in CI ?

@mtryan83
Copy link
Contributor Author

mtryan83 commented Feb 5, 2024

I don't really understand in what situation this race condition could happen ?

I honestly don't know why it's happening either. All I know is my processing chain broke when I switched from a single cpu system to a multicpu system (no other changes, I didn't even explicitly allow parallelism) and diving into the code showed self.data_series[0].particle_types only contained io, while ds_first had the expected particle types. Since ds_first is defined as ds_first = self.data_series[0] literally 9 lines up, I figured this would be the correct fix for it.

I'll try to look into it more over the next couple days, as I have time. If it helps, the timeseries was composed of 4 adjacent GIZMO runs with ~ 10^5 particles each and essentially all I was doing was loading the files into a DatasetSeries (as a list of files to the DatasetSeries constructor, not through yt.load(), so maybe that's part of the problem? Though that's allowed according to the docs) and running ts.particle_trajectories() on it.

@neutrinoceros
Copy link
Member

That's weird ! anyhow I don't think we should require an in depth explanation to merge this (but it still would be nice to have). I'll leave it open in case any one wants to join the discussion but we'll want to merge it in time for yt 4.3.1

@matthewturk matthewturk merged commit c1bd57a into yt-project:main Feb 8, 2024
12 of 13 checks passed
meeseeksmachine pushed a commit to meeseeksmachine/yt that referenced this pull request Feb 8, 2024
…incorrect fields categorization when computing particle_trajectories
neutrinoceros added a commit that referenced this pull request Feb 8, 2024
…2-on-yt-4.3.x

Backport PR #4802 on branch yt-4.3.x (Fix bug where race condition results in incorrect fields categorization when computing particle_trajectories)
neutrinoceros added a commit that referenced this pull request Feb 10, 2024
…ition results in incorrect fields categorization when computing particle_trajectories)"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants