Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add grib_tree method #399
Add grib_tree method #399
Changes from 2 commits
cee6410
8c2ba42
a91a1ca
3f6b93c
6e0eac3
52fdcef
0b22a2d
7f17999
37c65ce
cf63ed4
229c3da
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allow for parquet/bring-your-own output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we need to figure out how parquet works at all for deep nested data like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the hierarchy in zarr is just a path embedded in the key... it might must work.
If I want to add more tests, should I checkin some json output from scan_grib for a few variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested with zarr v3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to run scan_grib using zarr v3? As implemented, grib_tree is pretty tightly coupled on the output of scan_grib. I see you can set an ENV var to allow access to the experimental API...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so version should always be 2? Or is this not the zarr version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - you are right - this is the kerchunk version, not the zarr version anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not include the message about what to do in the logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to put more of the answer in the log message - though I have not been successful in doing it myself... yet.
So far I have used the idx file to confirm that I am unlikely to need these unknown variables in my use case with HRRR and GFS grib2 files.
Example:
gs://high-resolution-rapid-refresh/hrrr.20230928/conus/hrrr.t01z.wrfsfcf05.grib2
Msg # indexed from zero
IDX file entries (messages indexed from 1)
Example:
gs://high-resolution-rapid-refresh/hrrr.20230928/conus/hrrr.t01z.wrfsfcf05.grib2.idx
The metadata in the NOAA EMC generated idx files suggests we are missing local definitions for a number of variables that represent min, max or avg values over a layer. I have asked NOAA-EMC for help with decoding these properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, we can leave this for future improvement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crickets over at NOAA-EMC... I think we might have a working rust gribberish reader before we get a clear answer on how to build the ecCodes tables for these unknown properties.
🤞 @mpiannucci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the tables are annoying, need to figure that out in rust too to be fully comprehensive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any cases like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the same HRRR SFCF grib2 files discussed above re: unknown variables, here are two examples of unknown levels:
Example logs
Corresponding idx file entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying that we should be able to decode them? The grib object has many many possble attributes, so this seems solvable. Not needed for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - I believe these are just more custom codes, similar to the unknown variables above.
I think they are all documented and verifiable by reading the idx files, but implementing the tables for ecCodes to read them is more than I can take on right now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I put this in scan_grib where the cfgrib WrongStepUnitError is handled?
I don't like doing this here. Another alternative would be to use the preprocess hook in MultiZarrToZarr?
We should be able to calculate the correct step value by subtracting the valid_time from the time (reference time).
That requires decoding those values though and then encoding the step correctly...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the HRRR SubHF dataset, even the step values that are decoded (don't raise an exception) are wrong.
The values are in floating point minute but the units are specified as hours.
Only the stepType instant for the end of the forecast is correct - 1 hour.