Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modis l2 available datasets #913

Merged
merged 22 commits into from
Jan 21, 2025
Merged

Conversation

BENR0
Copy link
Collaborator

@BENR0 BENR0 commented Sep 25, 2019

Adds the available_datasets method to the modis level 2 reader to get datasets of arbitrary modis level 2 product files which are not specified in the readers yaml file as suggested in #812 (review).

Currently only datasets which have a 2D shape are added and bit encoded fields are not decoded.

There are lots of level two products out there so this might not yet work in every case. I tested a couple of products which worked fine. I am not sure how to implement mock tests which cover all these different files because I am not very fit with writing tests with mock. If someone can help me out/ give me some hints I can implement those so that this can eventually be merged.

Also there is a problem with the output of the available_dataset_ids of Scene when this is used with the cloud_mask dataset of the mod35_l2 files which are already specified in the yaml file. While the dataset can be loaded it is not shown as available when available_dataset_ids is envoked.
The reason for this as fas as I can see is that this method returns the available_ids keys in the FileYAMLReader here

def available_dataset_ids(self):

and not the all_ids attribute where all of them are included. My guess is that available_ids should also be updated with the contents of all_ids in
def update_ds_ids_from_file_handlers(self):

Not directly related to the main purpose of this PR I also added add_offset to the scale_factor conversion in the hdfeos_base which was missing. Currently data is read with pyhdf, maybe the reading could be changed to xarray which would do this conversion automatically and also reads the other attributes which might be desireable?

  • Tests added and test suite added to parent suite
  • Tests passed
  • Passes flake8 satpy
  • Fully documented
  • Add your name to AUTHORS.md if not there already

@djhoese
Copy link
Member

djhoese commented Sep 27, 2019

Currently data is read with pyhdf, maybe the reading could be changed to xarray which would do this conversion automatically and also reads the other attributes which might be desireable?

Can xarray use pyhdf?

@djhoese
Copy link
Member

djhoese commented Sep 27, 2019

As for the all versus available, the available_dataset_ids are only the datasets that are available in the files provided so they should not be merged with the "all" datasets (which are all of the possible datasets we know about regardless of the files provided).

Let me check your implementation.

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, great job! Nice start to getting this reader to be more flexible. There are few things that need to be changed, but it's close.

As for the tests, if you add one additional 2D variable (one not configured in the YAML) and one 3D variable to the existing tests and make sure that the 2D variable is added to the available_dataset_ids and the 3D one isn't then I think that'd be good enough.

@@ -10,6 +10,10 @@ file_types:
file_patterns:
- 'M{platform_indicator:1s}D35_L2.A{acquisition_time:%Y%j.%H%M}.{collection:03d}.{production_time:%Y%j%H%M%S}.hdf'
file_reader: !!python/name:satpy.readers.modis_l2.ModisL2HDFFileHandler
modis_l2_product:
file_patterns:
- 'M{platform_indicator:1s}D{product:2s}_L2.A{acquisition_time:%Y%j.%H%M}.{collection:03d}.{production_time:%Y%j%H%M%S}.hdf'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem with this is that it overlaps with the mod35_hdf pattern so a file could be matched to either file type depending on which one was checked first. Either we have a file pattern for each possible level 2 filename or one single generic one. There are two problems with the single generic one:

  • We'd have to handle cloud_mask specially in available_datasets so that it is only shown as available for the mod35 file. Not a huge deal, but could be confusing to some.
  • Bigger issue is that if someone provides multiple level 2 files there is no way for Satpy to know which files are additional granules of the same file type (all MOD35 files) or if they are different level 2 product files. The reader ends up asking every file handler for a product even though only some of them have the dataset. I ran in to this issue with the ABI L2 data files even though those aren't granules.

In the future we could make the base reader check the file handler after it is created to see if it changed it's file type. For example the ModisL2HDFFileHandler could say "I see that 'product' in the filename is XX so my file type is actually 'modis_l2_product_XX'". The base reader then knows how to organize the files. @mraspaud did you have to do anything like this for the VIIRS SDR reader when you updated it for the various file naming schemes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I guessed that this overlap might pose a problem. But I after experimenting with the order of the patterns I thought that the reader took the first match which would work in the current situation but might be difficult in the future if more datasets are added to the yaml file. I think it would be nice if the reader could also identify if any dataset in the file needs special treatment like bit decoding (then there would be no need for specifying additional datasets in the yaml file) . But this might not be easy, if doable at all.

The problem with different level 2 product files given to the reader at the same time also occurred to me but I thought that most users might be smart enough to keep product files in different directories but I guess that is a hopeful wish and sooner or later somebody will discover this "bug". I think I don't understand enough of the inner workings of how the readers are initialized (interested though to get a better understanding of what is done when and the idea behind it) to judge what would be the best way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I'm 99% sure that there is no guarantee about the order of which file pattern/file type is checked first so it might come down to luck. Maybe let's continue the file type discussion on slack to nail down what I'm thinking. I could make a separate PR for the functionality I'm thinking of and then we can incorporate it in to this PR after it is merged.

satpy/readers/modis_l2.py Outdated Show resolved Hide resolved
satpy/readers/modis_l2.py Outdated Show resolved Hide resolved
@BENR0
Copy link
Collaborator Author

BENR0 commented Sep 30, 2019

Currently data is read with pyhdf, maybe the reading could be changed to xarray which would do this conversion automatically and also reads the other attributes which might be desireable?

Can xarray use pyhdf?

I guess it uses pyhdf in the background because I can open modis hdf files with it.

@djhoese
Copy link
Member

djhoese commented Sep 30, 2019

I wouldn't be surprised if it is using NetCDF to read pyhdf by using the NetCDF C library compiled with the HDF4 C library.

@mraspaud mraspaud added component:readers enhancement code enhancements, features, improvements labels Oct 24, 2019
satpy/readers/mimic_TPW2_nc.py Outdated Show resolved Hide resolved
satpy/tests/reader_tests/test_ahi_hrit.py Outdated Show resolved Hide resolved
@ghost
Copy link

ghost commented Dec 31, 2020

DeepCode's analysis on #c6332c found:

  • ⚠️ 1 warning, ℹ️ 38 minor issues. 👇
  • ✔️ 31 issues were fixed.

Top issues

Description Example fixes
Defining only __eq__ but not __ne__ will result in a Python2 error if objects are compared with inequality. Occurrences: 🔧 Example fixes
Unused CRS imported from pyproj Occurrences: 🔧 Example fixes
Statement seems to have no effect Occurrences: 🔧 Example fixes

👉 View analysis in DeepCode’s Dashboard | Configure the bot

@djhoese
Copy link
Member

djhoese commented Aug 3, 2022

It looks like git got the best of this pull request with 2558 commits being shown. I can't even get github to load all the file changes to show me what changed to the hdfeos_base.py file. @BENR0 How do you feel about this PR? Do you want me to try to extract the old history before the merge with main corrupted things?

@BENR0 BENR0 force-pushed the modis_l2_available_datasets branch from c6332cb to f36993c Compare August 5, 2022 08:19
@BENR0
Copy link
Collaborator Author

BENR0 commented Aug 5, 2022

@djhoese yeah I could not remember what I did that messed things up (didn't even notice until your comment).

I reset to the appropriate commit and merged main and force pushed which seemed to fix the weird amount of number of commits.

@djhoese
Copy link
Member

djhoese commented Aug 5, 2022

Nice job. I'm still a little concerned by the file pattern that was added, but otherwise quickly glancing at the available datasets method, it seems pretty good. However, I think it needs to be updated to make sure the variable that it found in the file isn't one that was already configured in the YAML, right?

@BENR0
Copy link
Collaborator Author

BENR0 commented Aug 5, 2022

However, I think it needs to be updated to make sure the variable that it found in the file isn't one that was already configured in the YAML, right?

That rings a bell. But I can't remember. To be honest I just fixed the commit issue. I have to look at his a little closer again.

@BENR0
Copy link
Collaborator Author

BENR0 commented Dec 12, 2023

I don't know why netcdf writer tests are failing. Didn't touch anything related.

@pnuu
Copy link
Member

pnuu commented Dec 12, 2023

Merge with main and they'll work again, there was another XArray release without compression kwarg fixes. See #2674

Copy link

codecov bot commented Dec 12, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.11%. Comparing base (38d106d) to head (27b53c3).
Report is 25 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #913   +/-   ##
=======================================
  Coverage   96.10%   96.11%           
=======================================
  Files         381      381           
  Lines       55544    55589   +45     
=======================================
+ Hits        53383    53428   +45     
  Misses       2161     2161           
Flag Coverage Δ
behaviourtests 3.89% <0.00%> (-0.01%) ⬇️
unittests 96.20% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@coveralls
Copy link

coveralls commented Dec 12, 2023

Pull Request Test Coverage Report for Build 12163183463

Details

  • 48 of 48 (100.0%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 96.211%

Totals Coverage Status
Change from base Build 12131887891: 0.02%
Covered Lines: 53301
Relevant Lines: 55400

💛 - Coveralls

Copy link
Member

@pnuu pnuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One inline suggestion. I think there should be a test, but it might be pretty hard to write...

satpy/readers/modis_l2.py Outdated Show resolved Hide resolved
Co-authored-by: Panu Lahtinen <pnuu+git@iki.fi>
@BENR0
Copy link
Collaborator Author

BENR0 commented Dec 18, 2023

Darn I am pretty sure I did write tests at some point but can't find/recover them. I guess they got lost in the git mixup when I re-merged main and force pushed :-/. I will add some tests again.

@mraspaud mraspaud merged commit d96b46f into pytroll:main Jan 21, 2025
15 of 18 checks passed
@BENR0 BENR0 deleted the modis_l2_available_datasets branch January 21, 2025 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:readers enhancement code enhancements, features, improvements
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants