Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: Essential diagnostic variables #57

Closed
jthielen opened this issue Mar 6, 2022 · 3 comments · Fixed by #59
Closed

[FEATURE]: Essential diagnostic variables #57

jthielen opened this issue Mar 6, 2022 · 3 comments · Fixed by #59
Labels
enhancement New feature or request
Milestone

Comments

@jthielen
Copy link
Collaborator

jthielen commented Mar 6, 2022

Description

A key feature that was formally part of #14, but was stripped out while we were still working out #20, was calculating the key diagnostic variables needed to use WRF output in most analysis workflows:

  • 'T' going to potential temperature with a magic number offset of 300 K
  • 'P' and 'PB' combine to form pressure
  • 'PH' and 'PHB' combine to form geopotential
  • Geopotential (see previous) to geopotential height conversion depends on a particular value of g (9.81 m s**2) that may not match the value used elsewhere

Implementation

This was implemented previously in #14 as

def calc_base_diagnostics(dataset, drop=True):
    """Calculate the four basic fields that WRF does not have in physically meaningful form.
    Parameters
    ----------
    dataset : xarray.Dataset
        Dataset representing WRF data opened via normal backend, with chunking.
    drop : bool
        Decide whether to drop the components of origin after creating the diagnostic fields from
        them.
    Notes
    -----
    This operation should be called before destaggering.
    """
    # Potential temperature
    dataset['air_potential_temperature'] = dataset['T'] + 300
    dataset['air_potential_temperature'].attrs = {
        'units': 'K',
        'standard_name': 'air_potential_temperature'
    }
    if drop:
        del dataset['T']

    # Pressure
    dataset['air_pressure'] = dataset['P'] + dataset['PB']
    dataset['air_pressure'].attrs = {
        'units': dataset['P'].attrs.get('units', 'Pa'),
        'standard_name': 'air_pressure'
    }
    if drop:
        del dataset['P'], dataset['PB']

    # Geopotential and geopotential height
    dataset['geopotential'] = dataset['PH'] + dataset['PHB']
    dataset['geopotential'].attrs = {
        'units': 'm**2 s**-2',
        'standard_name': 'geopotential'
    }
    dataset['geopotential_height'] = dataset['geopotential'] / 9.81
    dataset['geopotential_height'].attrs = {
        'units': 'm',
        'standard_name': 'geopotential_height'
    }
    if drop:
        del dataset['PH'], dataset['PHB']

    return dataset

Tests

These are pretty straightforward calculations, so creating tests using our existing "raw wrfout" (i.e., not the "dummy" or "geo_em" files) should also be straightforward.

Questions

  • How should we consider lazy-loading? If we do the straight implementation like included above, it should "just work" with Dask for delayed loading, but will otherwise eagerly evaluate...is this okay?
@jthielen jthielen added the enhancement New feature or request label Mar 6, 2022
@jthielen jthielen added this to the v0.0.1 milestone Mar 6, 2022
@jthielen jthielen mentioned this issue Mar 6, 2022
3 tasks
@andersy005
Copy link
Member

Will this calculation be part of the xwrf.postprocess() or will it be a standalone function to be invoked by users explicitly?

@kmpaul
Copy link
Contributor

kmpaul commented Mar 7, 2022

  • How should we consider lazy-loading? If we do the straight implementation like included above, it should "just work" with Dask for delayed loading, but will otherwise eagerly evaluate...is this okay?

I think that would be expected behavior for something like this. I like the idea of the diagnostic variables "being ready for access" (i.e., already set with a Python ds['variable'] = ... statement), and initializing the dataset with Dask makes it lazy-loading by default. I think that's the right approach.

@jthielen
Copy link
Collaborator Author

jthielen commented Mar 7, 2022

Will this calculation be part of the xwrf.postprocess() or will it be a standalone function to be invoked by users explicitly?

Yes, as I'd view it as needed for most workflows (and if not taken, users could be mislead by what is there). Still, it'd be easy to make it conditional like decode_times.

I think that would be expected behavior for something like this. I like the idea of the diagnostic variables "being ready for access" (i.e., already set with a Python ds['variable'] = ... statement), and initializing the dataset with Dask makes it lazy-loading by default. I think that's the right approach.

Sounds good, that's enough confirmation for me to make a PR based on this approach!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants