Fix bug where race condition results in incorrect fields categorization when computing particle_trajectories #4802

mtryan83 · 2024-02-02T01:16:54Z

In the _get_data method of particle_trajectories, self.data_series[0] might still be unloaded (not sure if that's the right word?) when running in parallel. That was causing the if statement that checks if the particle_type of missing field exists to fail, incorrectly adding fields to the grid_fields list. Since ds_first is loaded (because we called all_data() on it earlier), this is now more likely to succeed.

Essentially

ds_first = self.data_series[0]
dd_first = ds_first.all_data()
# ds_first is now loaded, with fields generated correctly

fds = {}
new_particle_fields = []
for field in missing_fields:
    fds[field] = dd_first._determine_fields(field)[0]
    if field not in self.particle_fields:
        ftype = fds[field][0]
        #### race condition here:
        # if ftype in self.data_series[0].particle_types:
        # even though ds_first is loaded, this may not have propagated back 
        # to self.data_series[0]. 
        # So self.data_series[0].particle_types != ds_first.particle_types
        #### fix:
        if ftype in ds_first.particle_types:
            self.particle_fields.append(field)
            new_particle_fields.append(field)

In my case, self.data_series[0].particle_types didn't even have the all particle_type, even though ds_first had all and several others. Let me know if you need more information.

PR Summary

PR Checklist

New features are documented, with docstrings and narrative docs
Adds a test for any bugs fixed. Adds tests for new features.

Unfortunately, I don't know why this wasn't picked up in the tests, I would expect it to be picked up by
test_default_field_tuple.

…ded(?). ds_first is not (since we call all_data() on it). That was causing this if statement to fail.

mtryan83 · 2024-02-02T03:13:43Z

The failing tests appear to be unrelated.

neutrinoceros

Your fix is absolutely reasonable but I don't really understand in what situation this race condition could happen ? I don't know that we enable any parallelism in tests, which may be why we never caught this in CI ?

mtryan83 · 2024-02-05T00:01:46Z

I don't really understand in what situation this race condition could happen ?

I honestly don't know why it's happening either. All I know is my processing chain broke when I switched from a single cpu system to a multicpu system (no other changes, I didn't even explicitly allow parallelism) and diving into the code showed self.data_series[0].particle_types only contained io, while ds_first had the expected particle types. Since ds_first is defined as ds_first = self.data_series[0] literally 9 lines up, I figured this would be the correct fix for it.

I'll try to look into it more over the next couple days, as I have time. If it helps, the timeseries was composed of 4 adjacent GIZMO runs with ~ 10^5 particles each and essentially all I was doing was loading the files into a DatasetSeries (as a list of files to the DatasetSeries constructor, not through yt.load(), so maybe that's part of the problem? Though that's allowed according to the docs) and running ts.particle_trajectories() on it.

neutrinoceros · 2024-02-05T06:53:12Z

That's weird ! anyhow I don't think we should require an in depth explanation to merge this (but it still would be nice to have). I'll leave it open in case any one wants to join the discussion but we'll want to merge it in time for yt 4.3.1

…incorrect fields categorization when computing particle_trajectories

…2-on-yt-4.3.x Backport PR #4802 on branch yt-4.3.x (Fix bug where race condition results in incorrect fields categorization when computing particle_trajectories)

…ition results in incorrect fields categorization when computing particle_trajectories)"

At this point in the method, self.data_series[0] might still be unloa…

5af6943

…ded(?). ds_first is not (since we call all_data() on it). That was causing this if statement to fail.

neutrinoceros approved these changes Feb 2, 2024

View reviewed changes

neutrinoceros added this to the 4.3.1 milestone Feb 2, 2024

neutrinoceros added bug index: particle labels Feb 2, 2024

matthewturk approved these changes Feb 8, 2024

View reviewed changes

matthewturk merged commit c1bd57a into yt-project:main Feb 8, 2024
12 of 13 checks passed

meeseeksmachine pushed a commit to meeseeksmachine/yt that referenced this pull request Feb 8, 2024

Backport PR yt-project#4802: Fix bug where race condition results in …

9be6120

…incorrect fields categorization when computing particle_trajectories

meeseeksmachine mentioned this pull request Feb 8, 2024

Backport PR #4802 on branch yt-4.3.x (Fix bug where race condition results in incorrect fields categorization when computing particle_trajectories) #4811

Merged

neutrinoceros added a commit that referenced this pull request Feb 10, 2024

Revert "Backport PR #4802 on branch yt-4.3.x (Fix bug where race cond…

0a5ff6c

…ition results in incorrect fields categorization when computing particle_trajectories)"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug where race condition results in incorrect fields categorization when computing particle_trajectories #4802

Fix bug where race condition results in incorrect fields categorization when computing particle_trajectories #4802

mtryan83 commented Feb 2, 2024

mtryan83 commented Feb 2, 2024

neutrinoceros left a comment

mtryan83 commented Feb 5, 2024

neutrinoceros commented Feb 5, 2024

Fix bug where race condition results in incorrect fields categorization when computing particle_trajectories #4802

Fix bug where race condition results in incorrect fields categorization when computing particle_trajectories #4802

Conversation

mtryan83 commented Feb 2, 2024

PR Summary

PR Checklist

mtryan83 commented Feb 2, 2024

neutrinoceros left a comment

Choose a reason for hiding this comment

mtryan83 commented Feb 5, 2024

neutrinoceros commented Feb 5, 2024