-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add channel aliases to the CLAVRx reader to facilitate composites #2441
Conversation
…flectance derived information for the L1b channels at 2 km.
I wonder if maybe we could have a more descriptive title to this PR? 😜 |
… the need for a special lookup within reader Add tests to make sure aliases are created.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #2441 +/- ##
==========================================
+ Coverage 95.40% 95.91% +0.50%
==========================================
Files 371 373 +2
Lines 52825 52946 +121
==========================================
+ Hits 50399 50782 +383
+ Misses 2426 2164 -262
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Updated tests to use fake goes data to test existing alias creation Reduced aliases to ABI and VIIRS channels
repeating parameterize decorators can be removed.
satpy/readers/clavrx.py
Outdated
data = data.where((data >= valid_min) & (data <= valid_max), fill) | ||
else: | ||
if isinstance(valid_range, np.ndarray): | ||
valid_min = _CLAVRxHelper._scale_data(valid_range[0], factor, offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the proper way to patch this _scale_data function and use an assert_called() test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easiest might be to extract _scale_data
outside the class and then do the mocking. To do it "proper" (read: the way I would prefer) do:
from satpy.readers.clavrx import _scale_data
with mock.patch("satpy.readers.clavrx._scale_data", wraps=_scale_data) as scale_data:
...
scale_data.assert_called()
But overall it may be even easier if you just check the final result and make sure it has the scaled values you expect.
Fix indent when creating info for alias
… in some of the tests for yaml and expected datasets
Pull Request Test Coverage Report for Build 7933045894Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
… is important for AWIPS) It also seems important that integer (flag data) does not have a valid range provided. Add tests to flag this problem if it appears again.
…s behind Merge branch 'main' of github.com:pytroll/satpy into main
Merge branch 'main' of github.com:pytroll/satpy into clavrx_cloud_rgb
Merge branch 'clavrx_cloud_rgb' of github.com:joleenf/satpy into clavrx_cloud_rgb
Check for a float32 because that is what satpy is expecting Update valid_range to list if np.ndarray to fix type error in scale_data.
I am not sure I want to do anything about the CodeFactor fail. I am not certain there is anything I should do about the CI/test (3.11, ubuntu-latest) test. |
Don't worry about the CI failure. The "true" at the end of that is that it is using experimental versions of dependencies. With numpy 2.0 and some other dependencies we know this is failing. The CodeFactor thing I would like you to fix. It is complaining about the spacing you used. Let's remove any unnecessary spacing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple suggestions, but otherwise this looks mostly good to me. I'm wondering also if the tests (which we may have talked about) should be in a separate test_clavrx/
directory or something just to group them together. That way it might be easier to have any shared logic in some kind of _shared.py
or _utils.py
or something. Just an idea.
satpy/readers/clavrx.py
Outdated
@@ -94,9 +108,29 @@ def _get_rows_per_scan(sensor: str) -> Optional[int]: | |||
return None | |||
|
|||
|
|||
def _scale_data(data_arr: Union[xr.DataArray, int], scale_factor: float, add_offset: float) -> xr.DataArray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not requesting a change here, but FYI if you do from __future__ import annotations
at the top of the module you can then use xr.DataArray | int
here instead of the Union
.
satpy/readers/clavrx.py
Outdated
res = resolution_from_filename_info | ||
if res.endswith("m"): | ||
return int(res[:-1]) | ||
elif res is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif
is unnecessary. Not sure why the linter didn't complain about this. Maybe it isn't enabled on satpy or ruff isn't checking it yet.
elif res is not None: | |
if res is not None: |
But also...if res
can be None
then won't the previous res.endswith
fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It would.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than setting global attributes, is the nadir resolution used in remapping? I am wondering if it is okay if the nadir resolution is returned as none? There should be enough here to always have a result. I am just wondering if I should just let the reader fail if there is a None value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like self.resolution
is used to fill in ds_info["resolution"]
in the datasets. This becomes part of the DataArray.attrs
but also part of the DataID
that Satpy uses to identify each dataset/product. I think it is allowed to be None in most cases when it is unknown or not applicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an odd mix. Not sure if I should also re-work how the resolution value is extracted from the string. Sometimes I get too fancy, but I was thinking, hypothetically, resolution could be a different unit system. It is highly unlikely, but not impossible.
Is this overboard?
def _get_nadir_resolution(sensor, resolution_from_filename_info):
"""Get nadir resolution."""
for k, v in NADIR_RESOLUTION.items():
if sensor.startswith(k):
return v
if resolution_from_filename_info is None:
return None
elif isinstance(resolution_from_filename_info, int):
return int(resolution_from_filename_info)
elif isinstance(resolution_from_filename_info, str):
print(resolution_from_filename_info)
result = ''.join([i for i in resolution_from_filename_info if i.isdigit()])
return int(result)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little simpler:
def _get_nadir_resolution(sensor, resolution_from_filename_info):
"""Get nadir resolution."""
for k, v in NADIR_RESOLUTION.items():
if sensor.startswith(k):
return v
if resolution_from_filename_info is None:
return None
if isinstance(resolution_from_filename_info, str):
resolution_from_filename_info = ''.join([i for i in resolution_from_filename_info if i.isdigit()])
return int(resolution_from_filename_info)
Right now Satpy doesn't support units other than meters in the resolution field...although it also doesn't use it beyond the user requesting a specific resolution. I'd say removing the "m" is probably good enough unless you have evidence that a "123 degrees" or something might exist. Oh I guess there could be "km" but in that case you need to actually convert that or specify the units when assigning the resolution in some way. So maybe it is best to let it fail automatically by only removing the "m" (if isinstance(resolution_from_filename_info, str) and resolution_from_filename_info.startswith("m"):
)
satpy/readers/clavrx.py
Outdated
"""Scale data, if needed.""" | ||
scaling_needed = not (scale_factor == 1.0 and add_offset == 0.0) | ||
if scaling_needed: | ||
data_arr = data_arr * scale_factor + add_offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you/we need np.float32(scale_factor)
(and on the offset) to make sure 64-bit floats aren't created unnecessarily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
satpy/readers/clavrx.py
Outdated
proj, | ||
ncols, | ||
nlines, | ||
np.asarray(area_extent)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this np.asarray
was ever a thing, but area_extent should always be a tuple when passed to AreaDefinition.
satpy/readers/clavrx.py
Outdated
if matches and ds_info.get("resolution") != self.resolution: | ||
# reader knows something about this dataset (file type matches) | ||
# add any information that this reader can add. | ||
new_info = ds_info.copy() | ||
new_info["resolution"] = self.resolution | ||
handled_vars.add(ds_info["name"]) | ||
yield self.file_type_matches(ds_info["file_type"]), ds_info | ||
yield from self._available_new_datasets(handled_vars) | ||
yield True, new_info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the resolution check here be checking for is None
? What is the case where we, the file handler, are configured to handle this variable but the resolution is defined as something other than what we contain (currently resolution != self.resolution)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part has been tricky for me to get correct. The structure is essentially following Example 1. Are you suggesting:
matches = self.file_type_matches(ds_info["file_type"])
if matches and ds_info.get("resolution") != self.resolution:
# reader knows something about this dataset (file type matches)
# add any information that this reader can add.
new_info = ds_info.copy()
if self.resolution is not None:
new_info["resolution"] = self.resolution
handled_vars.add(ds_info["name"])
yield True, new_info
yield from self._available_file_datasets(handled_vars)
It has become a little complex trying to get the aliases to work within the available datasets, but that has nothing to do with the question of resolution, it just adds layers of yield statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The git diff is a little complicated now, but I kind of see the reasoning behind the way I made "Example 1". I think the idea is something like MODIS L1b files where each resolution is in a different file (250m, 500m, 1km). So in that case it is very likely that we could be looking at a ds_info
that has a resolution different than our own and I suppose the file matches is just a "shortcut" condition to say "yeah, we definitely don't know how to handle this dataset".
I guess what I'm saying is leave it the way you have it if it works for you.
Updates logic for the resolution value from the file.
ncols, | ||
nlines, | ||
np.asarray(area_extent)) | ||
f"{sensor}_geos", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removes the type np.asarray as requested. Simplifies the description using f-strings.
proj, | ||
ncols, | ||
nlines, | ||
area_extent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This resolved the np.asarray issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need a /__init__.py
in your new clavrx test directory. Maybe not because pytest will find it anyway, but we generally think of the test directories as part of the package so they need the init module.
I have a feeling ruff tried to change your apostrophes to double quotes when it thought they were single quotes. I'm hoping it will be OK with my latest commit. |
Thanks for sticking with it. I figure at this point you are the owner of this code so if anything goes wrong you're likely to be the one to identify it and fix it. I tried to review what I could and what I understood. Nice job. |
Thank-you Dave. |
This PR updates the clavrx reader with aliases for some visible channels to allow the use of the cimss cloud type composite and color bar (4-Level) for the cloud mask. Test updates have been made to verify that the aliases are being created correctly.