Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default filename for cf writer to be compatible with satpy_cf_nc reader #1637

Merged
merged 11 commits into from
Jun 24, 2024
1 change: 1 addition & 0 deletions satpy/etc/readers/satpy_cf_nc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@ file_types:
file_patterns:
- '{platform_name}-{sensor}-{resolution_type}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc'
- '{platform_name}-{sensor}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc'
- '{filename}.nc'
djhoese marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the GRIB2 reader, could this be {stem}.nc? stem would also be more accurate, as .nc is part of the filename.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerritholl I would be ok with that. I just added filename based on the generic_image reader config.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine by me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for stem also. The generic_image can be updated later.

2 changes: 1 addition & 1 deletion satpy/etc/writers/cf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ writer:
name: cf
description: Generic netCDF4/CF Writer
writer: !!python/name:satpy.writers.cf_writer.CFWriter
filename: '{name}_{start_time:%Y%m%d_%H%M%S}.nc'
filename: '{platform_name}-{sensor}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it was mentioned in the comments, but does anyone know/remember where this - separate filename came from? As you mentioned it seems to have existed in the reader first, but it isn't clear to me why/where from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my question here still stands, but not sure we're going to get anyone who has a real answer.

compress: DEFLATE
zlevel: 6
2 changes: 1 addition & 1 deletion satpy/tests/test_readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -751,7 +751,7 @@ def test_no_reader(self):

# without files it's going to be an empty result
assert group_files([]) == []
groups = group_files(self.g16_files)
groups = group_files(self.g16_files, reader='abi_l1b')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what this has to do with this PR. Moreover, the docstring of this test says that the we want to check that we can skip the reader argument, and with change does the opposite?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BENR0 can you comment on this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mraspaud what you are saying makes sense to me. I honestly didn't check what that test does I just made this change based on @djhoese suggestion so maybe he can shed more light on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa I did not realize what the test was actually trying to accomplish. This is one downside of the generic filename in the reader. It matches everything and has the change to "steal" matches from other readers. Either:

  1. This test gets updated to expect 1 group of files (not great).
  2. The reader gets updated with group keys to sort by start time...which might work and fix this test.
  3. Remove the default file pattern from the reader

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well obviously I am against option 3. :-D. I think removing a "feature" because a test fails is not the way to go. I am not sure which of the remaining options is the best way to move forward.

Copy link
Member

@gerritholl gerritholl Feb 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Satpy needs a strategy to choose a reader if multiple readers match a file pattern. I think the current strategy is that the first reader with a match wins, but the order of readers is not explicitly defined?

Copy link
Member

@djhoese djhoese Feb 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reader order may come down to:

reader_configs = glob_config(os.path.join('readers', '*.yaml'), search_dirs=paths)

in most cases. There may be others. It may get even more complicated when we look at that glob_configs function since it is searching not only Satpy's configs but any user configured paths too:

satpy/satpy/_config.py

Lines 156 to 184 in c0b69be

def config_search_paths(filename, search_dirs=None, **kwargs):
"""Get series of configuration base paths where Satpy configs are located."""
if search_dirs is None:
search_dirs = get_config_path_safe()[::-1]
paths = [filename, os.path.basename(filename)]
paths += [os.path.join(search_dir, filename) for search_dir in search_dirs]
paths += [os.path.join(PACKAGE_CONFIG_PATH, filename)]
paths = [os.path.abspath(path) for path in paths]
if kwargs.get("check_exists", True):
paths = [x for x in paths if os.path.isfile(x)]
paths = list(OrderedDict.fromkeys(paths))
# flip the order of the list so builtins are loaded first
return paths[::-1]
def glob_config(pattern, search_dirs=None):
"""Return glob results for all possible configuration locations.
Note: This method does not check the configuration "base" directory if the pattern includes a subdirectory.
This is done for performance since this is usually used to find *all* configs for a certain component.
"""
patterns = config_search_paths(pattern, search_dirs=search_dirs,
check_exists=False)
for pattern_fn in patterns:
for path in glob.iglob(pattern_fn):
yield path

Either way, it clearly uses iglob which isn't going to be sorted.

As for my option 3 @BENR0, it isn't that the test fails, it is that the failing test represents a use case (as @gerritholl says): which reader gets chosen when a user doesn't specify a reader (they don't know what one they want). With this PR they seem to always get the CF reader even though there is a much better choice.

@gerritholl besides alphabetical order, I'm not sure we want to go down the performance-lossy road of checking all readers for a match against a filename. We could implement something that knows that the CF reader and the generic image reader should be a last resort:

# pseudo code
generic_choice = None
for reader, reader_info in ...:
    if reader.matches(filename):
        if not reader_info["generic"]:
            return reader
        generic_choice = reader
return generic_choice

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. This discussion reminded me of the conversation regarding overlapping file patterns in #913 which might be related.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar, but they are separate bits of code. The overlapping file patterns will likely always be a little difficult. This reader choice one can have a solution, but it might slow some operations down.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I exchanged "filename" for "stem" which obviously was the easy part. @djhoese @mraspaud @sfinkens @gerritholl What is the proposed way, based on the discussion above, to move this forward?

self.assertEqual(6, len(groups))

def test_unknown_files(self):
Expand Down