-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modis l2 available datasets #913
Conversation
Can xarray use pyhdf? |
As for the all versus available, the Let me check your implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, great job! Nice start to getting this reader to be more flexible. There are few things that need to be changed, but it's close.
As for the tests, if you add one additional 2D variable (one not configured in the YAML) and one 3D variable to the existing tests and make sure that the 2D variable is added to the available_dataset_ids and the 3D one isn't then I think that'd be good enough.
satpy/etc/readers/modis_l2.yaml
Outdated
@@ -10,6 +10,10 @@ file_types: | |||
file_patterns: | |||
- 'M{platform_indicator:1s}D35_L2.A{acquisition_time:%Y%j.%H%M}.{collection:03d}.{production_time:%Y%j%H%M%S}.hdf' | |||
file_reader: !!python/name:satpy.readers.modis_l2.ModisL2HDFFileHandler | |||
modis_l2_product: | |||
file_patterns: | |||
- 'M{platform_indicator:1s}D{product:2s}_L2.A{acquisition_time:%Y%j.%H%M}.{collection:03d}.{production_time:%Y%j%H%M%S}.hdf' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem with this is that it overlaps with the mod35_hdf pattern so a file could be matched to either file type depending on which one was checked first. Either we have a file pattern for each possible level 2 filename or one single generic one. There are two problems with the single generic one:
- We'd have to handle
cloud_mask
specially in available_datasets so that it is only shown as available for the mod35 file. Not a huge deal, but could be confusing to some. - Bigger issue is that if someone provides multiple level 2 files there is no way for Satpy to know which files are additional granules of the same file type (all MOD35 files) or if they are different level 2 product files. The reader ends up asking every file handler for a product even though only some of them have the dataset. I ran in to this issue with the ABI L2 data files even though those aren't granules.
In the future we could make the base reader check the file handler after it is created to see if it changed it's file type. For example the ModisL2HDFFileHandler could say "I see that 'product' in the filename is XX so my file type is actually 'modis_l2_product_XX'". The base reader then knows how to organize the files. @mraspaud did you have to do anything like this for the VIIRS SDR reader when you updated it for the various file naming schemes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I guessed that this overlap might pose a problem. But I after experimenting with the order of the patterns I thought that the reader took the first match which would work in the current situation but might be difficult in the future if more datasets are added to the yaml file. I think it would be nice if the reader could also identify if any dataset in the file needs special treatment like bit decoding (then there would be no need for specifying additional datasets in the yaml file) . But this might not be easy, if doable at all.
The problem with different level 2 product files given to the reader at the same time also occurred to me but I thought that most users might be smart enough to keep product files in different directories but I guess that is a hopeful wish and sooner or later somebody will discover this "bug". I think I don't understand enough of the inner workings of how the readers are initialized (interested though to get a better understanding of what is done when and the idea behind it) to judge what would be the best way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. I'm 99% sure that there is no guarantee about the order of which file pattern/file type is checked first so it might come down to luck. Maybe let's continue the file type discussion on slack to nail down what I'm thinking. I could make a separate PR for the functionality I'm thinking of and then we can incorporate it in to this PR after it is merged.
I guess it uses pyhdf in the background because I can open modis hdf files with it. |
I wouldn't be surprised if it is using NetCDF to read pyhdf by using the NetCDF C library compiled with the HDF4 C library. |
DeepCode's analysis on #c6332c found:
Top issues
👉 View analysis in DeepCode’s Dashboard | Configure the bot |
It looks like |
c6332cb
to
f36993c
Compare
@djhoese yeah I could not remember what I did that messed things up (didn't even notice until your comment). I reset to the appropriate commit and merged main and force pushed which seemed to fix the weird amount of number of commits. |
Nice job. I'm still a little concerned by the file pattern that was added, but otherwise quickly glancing at the available datasets method, it seems pretty good. However, I think it needs to be updated to make sure the variable that it found in the file isn't one that was already configured in the YAML, right? |
That rings a bell. But I can't remember. To be honest I just fixed the commit issue. I have to look at his a little closer again. |
I don't know why netcdf writer tests are failing. Didn't touch anything related. |
Merge with |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #913 +/- ##
=======================================
Coverage 96.10% 96.11%
=======================================
Files 381 381
Lines 55544 55589 +45
=======================================
+ Hits 53383 53428 +45
Misses 2161 2161
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Pull Request Test Coverage Report for Build 12163183463Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One inline suggestion. I think there should be a test, but it might be pretty hard to write...
Co-authored-by: Panu Lahtinen <pnuu+git@iki.fi>
Darn I am pretty sure I did write tests at some point but can't find/recover them. I guess they got lost in the git mixup when I re-merged main and force pushed :-/. I will add some tests again. |
Adds the available_datasets method to the modis level 2 reader to get datasets of arbitrary modis level 2 product files which are not specified in the readers yaml file as suggested in #812 (review).
Currently only datasets which have a 2D shape are added and bit encoded fields are not decoded.
There are lots of level two products out there so this might not yet work in every case. I tested a couple of products which worked fine. I am not sure how to implement mock tests which cover all these different files because I am not very fit with writing tests with mock. If someone can help me out/ give me some hints I can implement those so that this can eventually be merged.
Also there is a problem with the output of the
available_dataset_ids
ofScene
when this is used with the cloud_mask dataset of the mod35_l2 files which are already specified in the yaml file. While the dataset can be loaded it is not shown as available whenavailable_dataset_ids
is envoked.The reason for this as fas as I can see is that this method returns the
available_ids
keys in the FileYAMLReader heresatpy/satpy/readers/yaml_reader.py
Line 282 in 35739e4
and not the
all_ids
attribute where all of them are included. My guess is thatavailable_ids
should also be updated with the contents ofall_ids
insatpy/satpy/readers/yaml_reader.py
Line 540 in 35739e4
Not directly related to the main purpose of this PR I also added
add_offset
to thescale_factor
conversion in the hdfeos_base which was missing. Currently data is read with pyhdf, maybe the reading could be changed to xarray which would do this conversion automatically and also reads the other attributes which might be desireable?flake8 satpy
AUTHORS.md
if not there already