-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change default filename for cf writer to be compatible with satpy_cf_nc reader #1637
Change default filename for cf writer to be compatible with satpy_cf_nc reader #1637
Conversation
I think the main purpose of this PR has been handled now in some other PR. You can see the current file types here: satpy/satpy/etc/readers/satpy_cf_nc.yaml Lines 14 to 15 in 153d868
I think we can close this? If you don't think so, let me know and we can reopen it. |
Well I would have to see the PR. The file type patterns above still don't match the patterns the cf writer uses by default. Additionally the cf reader needs the |
Here's what your (this) PR has:
And here's what is there currently:
I think those are exactly the same, right? |
:-D yes exactly the filename pattern from the reader the and writer should be the same. With my changes they are. Currently the filename pattern of the writer is: satpy/satpy/etc/writers/cf.yaml Line 5 in 62c0e83
|
Oh! 🤦 Your changes are to the writer. @TAlonglong @martin @sfinkens what do you think? Should the CF writer's default output filename be the more complex name in this PR (see my comment above) or should the CF reader be updated to also match the current simpler writer default filename (see above). |
Yes, the more complete name is better imo |
Thanks for looking into this @BENR0! I agree that the more complete name is better. Regarding the attribute name: Thinking a bit further: There's actually a standardized vocabulary for platforms, instruments etc (GCMD). For example you would specify
I think it would be nice if all the readers followed that convention, if possible. But that's clearly beyond the scope of this PR. |
@sfinkens thanks for the info. Some more thoughts from my side about the |
Yes that's good, reader and writer should be consistent!
The
I think that's fine for now. |
@djhoese @mraspaud @sfinkens working on #1695 I stumbled about this problem again and I think this is still relevant. Especially the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on updating this again. I've re-opened the PR. Looks like I closed it and forgot I had done that even though the conversation clearly led to it needing to be reopened.
I've made a few comments and things I'm questioning. There are few things in the writer that I don't think are needed anymore. Sorry for taking so long to get to this PR that part of it were handled by others.
Not sure how but it seems the tests are failing because |
Oh the The problem is that
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1637 +/- ##
==========================================
- Coverage 95.94% 95.94% -0.01%
==========================================
Files 366 366
Lines 53613 53504 -109
==========================================
- Hits 51441 51332 -109
Misses 2172 2172
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
satpy/etc/readers/satpy_cf_nc.yaml
Outdated
@@ -17,3 +17,4 @@ file_types: | |||
file_patterns: | |||
- '{platform_name}-{sensor}-{resolution_type}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc' | |||
- '{platform_name}-{sensor}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc' | |||
- '{filename}.nc' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with the GRIB2 reader, could this be {stem}.nc
? stem
would also be more accurate, as .nc
is part of the filename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gerritholl I would be ok with that. I just added filename
based on the generic_image
reader config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine by me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote for stem
also. The generic_image can be updated later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR, I just have a question regarding the group file test.
satpy/tests/test_readers.py
Outdated
@@ -751,7 +751,7 @@ def test_no_reader(self): | |||
|
|||
# without files it's going to be an empty result | |||
assert group_files([]) == [] | |||
groups = group_files(self.g16_files) | |||
groups = group_files(self.g16_files, reader='abi_l1b') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what this has to do with this PR. Moreover, the docstring of this test says that the we want to check that we can skip the reader
argument, and with change does the opposite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BENR0 can you comment on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoa I did not realize what the test was actually trying to accomplish. This is one downside of the generic filename in the reader. It matches everything and has the change to "steal" matches from other readers. Either:
- This test gets updated to expect 1 group of files (not great).
- The reader gets updated with group keys to sort by start time...which might work and fix this test.
- Remove the default file pattern from the reader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well obviously I am against option 3. :-D. I think removing a "feature" because a test fails is not the way to go. I am not sure which of the remaining options is the best way to move forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Satpy needs a strategy to choose a reader if multiple readers match a file pattern. I think the current strategy is that the first reader with a match wins, but the order of readers is not explicitly defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reader order may come down to:
satpy/satpy/readers/__init__.py
Line 328 in c0b69be
reader_configs = glob_config(os.path.join('readers', '*.yaml'), search_dirs=paths) |
in most cases. There may be others. It may get even more complicated when we look at that glob_configs
function since it is searching not only Satpy's configs but any user configured paths too:
Lines 156 to 184 in c0b69be
def config_search_paths(filename, search_dirs=None, **kwargs): | |
"""Get series of configuration base paths where Satpy configs are located.""" | |
if search_dirs is None: | |
search_dirs = get_config_path_safe()[::-1] | |
paths = [filename, os.path.basename(filename)] | |
paths += [os.path.join(search_dir, filename) for search_dir in search_dirs] | |
paths += [os.path.join(PACKAGE_CONFIG_PATH, filename)] | |
paths = [os.path.abspath(path) for path in paths] | |
if kwargs.get("check_exists", True): | |
paths = [x for x in paths if os.path.isfile(x)] | |
paths = list(OrderedDict.fromkeys(paths)) | |
# flip the order of the list so builtins are loaded first | |
return paths[::-1] | |
def glob_config(pattern, search_dirs=None): | |
"""Return glob results for all possible configuration locations. | |
Note: This method does not check the configuration "base" directory if the pattern includes a subdirectory. | |
This is done for performance since this is usually used to find *all* configs for a certain component. | |
""" | |
patterns = config_search_paths(pattern, search_dirs=search_dirs, | |
check_exists=False) | |
for pattern_fn in patterns: | |
for path in glob.iglob(pattern_fn): | |
yield path |
Either way, it clearly uses iglob
which isn't going to be sorted.
As for my option 3 @BENR0, it isn't that the test fails, it is that the failing test represents a use case (as @gerritholl says): which reader gets chosen when a user doesn't specify a reader (they don't know what one they want). With this PR they seem to always get the CF reader even though there is a much better choice.
@gerritholl besides alphabetical order, I'm not sure we want to go down the performance-lossy road of checking all readers for a match against a filename. We could implement something that knows that the CF reader and the generic image reader should be a last resort:
# pseudo code
generic_choice = None
for reader, reader_info in ...:
if reader.matches(filename):
if not reader_info["generic"]:
return reader
generic_choice = reader
return generic_choice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. This discussion reminded me of the conversation regarding overlapping file patterns in #913 which might be related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar, but they are separate bits of code. The overlapping file patterns will likely always be a little difficult. This reader choice one can have a solution, but it might slow some operations down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I exchanged "filename" for "stem" which obviously was the easy part. @djhoese @mraspaud @sfinkens @gerritholl What is the proposed way, based on the discussion above, to move this forward?
satpy/etc/readers/satpy_cf_nc.yaml
Outdated
@@ -17,3 +17,4 @@ file_types: | |||
file_patterns: | |||
- '{platform_name}-{sensor}-{resolution_type}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc' | |||
- '{platform_name}-{sensor}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc' | |||
- '{filename}.nc' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine by me.
Pull Request Test Coverage Report for Build 9516196907Details
💛 - Coveralls |
@@ -2,6 +2,6 @@ writer: | |||
name: cf | |||
description: Generic netCDF4/CF Writer | |||
writer: !!python/name:satpy.writers.cf_writer.CFWriter | |||
filename: '{name}_{start_time:%Y%m%d_%H%M%S}.nc' | |||
filename: '{platform_name}-{sensor}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it was mentioned in the comments, but does anyone know/remember where this -
separate filename came from? As you mentioned it seems to have existed in the reader first, but it isn't clear to me why/where from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my question here still stands, but not sure we're going to get anyone who has a real answer.
I don't use |
I removed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -2,6 +2,6 @@ writer: | |||
name: cf | |||
description: Generic netCDF4/CF Writer | |||
writer: !!python/name:satpy.writers.cf_writer.CFWriter | |||
filename: '{name}_{start_time:%Y%m%d_%H%M%S}.nc' | |||
filename: '{platform_name}-{sensor}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my question here still stands, but not sure we're going to get anyone who has a real answer.
The default filename pattern of the cf writer is changed to the format the
satpy_cf_nc
reader expects. Also the "instrument" attribute which thesatpy_cf_nc
reader expects is now added by the reader.I have been trying out the
satpy_cf_nc
reader and stumbled over some things.instrument
in theget_sensor
function of the reader. This is currently not documented and is not added by the cf writer but needs to be added by the user by supplying theheader_attrs
parameter. With the same argument I think the reader should add this.Apart from this I am wondering why the attribute is named
instrument
in thesatpy_cf_nc
reader while everywhere else it seems to be namedsensor
? Maybe this should be renamed?Additionally I was wondering why this is retrieved from the global attributes since the reader relies on dynamic datasets anyway and retrieves some attributes anyway in
_assign_ds_info
which could also be used to get the sensor. I guess the sensor information might not be available in all readers?However this is not to say that global attributes should not be used. Actually maybe the cf writer should write attributes common to all datasets to the global attributes similar to the
to_xarray_dataset
scene method?Note: a lot of the cf writer tests are failing due to the
instrument
attribute addition. I wanted to hear your thoughts on this before fixing these.