Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coordinate UX #11

Closed
lpilz opened this issue Nov 15, 2021 · 9 comments · Fixed by #14
Closed

Coordinate UX #11

lpilz opened this issue Nov 15, 2021 · 9 comments · Fixed by #14
Labels
enhancement New feature or request
Milestone

Comments

@lpilz
Copy link
Collaborator

lpilz commented Nov 15, 2021

I think this is pretty straightforward as we just need the lat, lon and time coordinates, all other can be discarded. Unstaggering will be done in the variable initialization. However, we should be aware of moving-nest runs and keep the time-dependence of lat and lon for these occasions.

@lpilz lpilz added the enhancement New feature or request label Nov 15, 2021
@jthielen
Copy link
Collaborator

I think this is pretty straightforward as we just need the lat, lon and time coordinates, all other can be discarded.

I disagree with that list, as we also need the projection coordinates (south_north and west_east) and the vertical coordinate (bottom_top). However, it's still straightforward, as hopefully #14 indicates!

Unstaggering will be done in the variable initialization.

Sounds good to me to do the destaggering of coordinates at the same time as data variables.

However, we should be aware of moving-nest runs and keep the time-dependence of lat and lon for these occasions.

In what little xwrf currently has, it assumes that we don't have a moving-nest. This simplifies things greatly with respect to dimension coordinates, so that's what I've stuck with in #14. That being said, moving nests are definitely important to handle eventually, but I'd vote to not worry about that for an initial release milestone.

@lpilz
Copy link
Collaborator Author

lpilz commented Nov 23, 2021

You're right, I was maybe a bit rash. What I meant was: if we decide to destagger, we don't need to keep the staggered variables.

To me it is important to obtain clarity in the xarray API. The user should be able to see on a glance which coordinates the various diagnostics and data are on. For that we need the projection coordinates (if we're not in a lat/lon scenario). However, I think your PR makes good headway into this direction :)

That being said, moving nests are definitely important to handle eventually, but I'd vote to not worry about that for an initial release milestone.

Seconding the prioritization :)

@kmpaul kmpaul added this to xWRF Dec 16, 2021
@kmpaul kmpaul added this to the Project xWRF milestone Dec 21, 2021
@lpilz lpilz changed the title Coordinates Coordinate UX Jan 25, 2022
@lpilz
Copy link
Collaborator Author

lpilz commented Jan 25, 2022

Maybe we should use #31 as a reason to discuss which coordinates we will have in the dataset at the end.

For the time coordinate, e.g., information in wrfout files is redundant. We have the Times coordinate, which contains timestamps (in string form) and we have XTIME, which contains timedeltas (minutes since simulation start). To avoid too much redundant data, I would propose to just drop XTIME and only keep Time.

The same goes for the spatial coordinates. I would suggest making dropping redundant information a rule. So if it were not too much hassle to compute on the fly, I would propose to keep south_north, west_east and bottom_top (+ staggered? #35) and just drop XLAT, XLONG etc.. This would simply entail adding tutorials on how to generate the lats and longs.

What do you say to these suggestions?

@lpilz
Copy link
Collaborator Author

lpilz commented Jan 25, 2022

I just remembered this might also apply to data variables. For use_theta_m = 0, the THM and the T field are identical.

@kmpaul
Copy link
Contributor

kmpaul commented Jan 26, 2022

@lpilz: This sounds reasonable to me.

@jthielen
Copy link
Collaborator

jthielen commented Jan 26, 2022

For the time coordinate, e.g., information in wrfout files is redundant. We have the Times coordinate, which contains timestamps (in string form) and we have XTIME, which contains timedeltas (minutes since simulation start). To avoid too much redundant data, I would propose to just drop XTIME and only keep Time.

That makes sense to me for this kind of one-dimension coordinate (so long as Times is parsed into a datetime dimension coordinate as previously intended)!

The same goes for the spatial coordinates. I would suggest making dropping redundant information a rule. So if it were not too much hassle to compute on the fly, I would propose to keep south_north, west_east and bottom_top (+ staggered? #35) and just drop XLAT, XLONG etc.. This would simply entail adding tutorials on how to generate the lats and longs.

I'd disagree with dropping latitude and longitude, however. While it is true that they can be re-generated (with careful handling of datums), it is not always trivial or cheap to compute, particularly with large/dense domains. Having both dimension coordinates and latitudes/longitudes is important for many workflows; a key example of this is projection-correct dynamical calculations (xref Unidata/MetPy#893), where you'll simultaneously need x/west_east, y/south_north, longitude, and latitude. I'd think we'd be imposing an unnecessary burden on users if we decided to always drop latitude and longitude, especially given there isn't a significant cost to keeping them around (as they aren't dimension coordinates, they can be automatically lazy-loaded just like any data variable).

Also, generalizing from that, I don't think always "dropping redundant information" is a good rule. For example, this rule would mean

  • drop/not include pressure, since it is redundant from surface pressure, model top pressure, and sigma coordinate
  • drop/not include geopotential height, since it is redundant from geopotential
  • drop all the map factors, since they are redundant with the horizontal coordinates

I think redundant data, so long as it is lazy-loaded, still has a place and would be appreciated by users. Perhaps a better rule would be "dropping redundant 0-D and 1-D information"?

@kmpaul
Copy link
Contributor

kmpaul commented Jan 26, 2022

@jthielen: I think you've hit the nail on the head by pointing out that the data is lazy-loaded. With that in mind, I'm more in favor of not dropping anything unless it's presence creates a problem. I'm not even sure it's necessary to remove the redundant time coordinates, now that I think about it.

This is one of those things that seems to be an issue of "trying to guess what the user wants." And my experience over the years is that you should always assume that the user wants everything. So, I'm now of the opinion that we shouldn't drop anything unless it becomes a problem for us to keep it.

@lpilz
Copy link
Collaborator Author

lpilz commented Jan 27, 2022

@jthielen you are completely right regarding data variables. My point was mainly on coordinates, though. Here, I feel that we should not include too much information and keep coordinates and dimensions as close to another as possible to avoid confusing the users who are mostly coming from earth and natural sciences. However, I do see your point regarding lat/long data. I didn't know there were applications which actually directly used this info.

My idea for how this could look when all features are implemented would be:

>>> ds = xr.open_dataset('wrfout_d03_2018-10-15_00:00:00').xwrf.process()
>>> list(ds.coords)
['XLAT',
 'XLONG',
 'south_north',
 'west_east',
 'bottom_top',
 'Time']

and

>>> ds = xr.open_dataset('wrfout_d03_2018-10-15_00:00:00').xwrf.process(destagger=False)
>>> list(ds.coords)
['XLAT',
 'XLONG',
 'south_north',
 'west_east',
 'south_north_stag',
 'west_east_stag',
 'bottom_top',
 'bottom_top_stag',
 'Time']

What do you say to these?

@kmpaul if you really want to retain XTIME, I would propose to parse it as a np.timedelta64 dtype. What do you think of this idea?

@kmpaul
Copy link
Contributor

kmpaul commented Jan 27, 2022

@lpilz: Yeah. If we can process the time-like dimensions into timedelta64 or datetime64 datatypes, that would be ideal. Just keep in mind that if any changes are made to the raw data (i.e., the DataArray data doesn't exactly match the corresponding data in the original file), then the modification should be stored in the encoding.

@andersy005 andersy005 moved this from 🌳 Todo to ▶ In Progress in Xdev Project Board Jan 31, 2022
@jthielen jthielen moved this to Done in xWRF Feb 16, 2022
Repository owner moved this from ▶ In Progress to ✅ Done in Xdev Project Board Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants