Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method to add missing GPS data to platform group #201

Closed
2 tasks
leewujung opened this issue Oct 15, 2020 · 11 comments
Closed
2 tasks

Method to add missing GPS data to platform group #201

leewujung opened this issue Oct 15, 2020 · 11 comments
Assignees
Milestone

Comments

@leewujung
Copy link
Member

Some EK80 data sets come without the NMEA datagrams due to hardware config variations. This is a case similar to #198 in which some environmental data are also missing in AZFP files. Let's first deal with the case when the latitude/longitude data are saved in netCDF or zarr files.

Goal

Add a method to enable adding ancillary lat/lon data into converted files, when the source data file (recorded by the instrument) does not contain these data.

Task

  • Add a method add_platform_data() to the Convert class that allows users to specify one or more nc/zarr files (will call this GPS files below) that can be opened by xarray and contain variables named latitude and longitude, and save the lat/lon to Platform group. The method should:

    • slice out the lat/lon data section that corresponds to the ping_time start and end of the acoustic data in the Beam group, and
    • be able to combine multiple GPS files before doing the slicing, using xarray's open_mfdataset.
  • Make the add_platform_data() function work with either whole bunch of individual files, each individually converted to nc/zarr, or a single combined output file from a list of individual files (i.e., the combine=True option).

Note: this functionality should be added to the class-redesign branch.

@leewujung
Copy link
Member Author

@imranmaj : could you start tackling the first task as a standalone function? check out set_platform() in /echopypt/convert/set_groups_ek80.py to see how the coordinates and attributes are encoded; the idea here is to substitute (may actually be overwrite) the NaN variables to actual values from the GPS files. I'll point you to where the test files are on slack.

@ngkavin : I think the second task would require collaboration between you and @imranmaj. Could you suggest how the function argument should look like and where it should be called in the conversion sequence, and the 3 of us can discuss as a group?

Thanks :)

@ngkavin
Copy link
Contributor

ngkavin commented Oct 15, 2020

Hi @imranmaj, glad to have you helping to develop echopype .

It seems like this function really only requires 1 input argument although more would accommodate more use-cases. That argument being the list of .nc/.zarr files that contain the platform information. This function could go anywhere between creating the Convert object and calling to_netcdf.
For example:

tmp= Convert(ek80_raw_path, model="EK80")
tmp.add_platform_data(files)
tmp.set_param(params)
tmp.to_netcdf()

The base functionality could be splitting the GPS data while saving each .nc file, but more use-cases could involve adding the GPS data after everything has been combine so that add_platform_dict would go after to_netcdf(combine=True).

@imranmaj
Copy link
Contributor

imranmaj commented Oct 15, 2020

Hi! Thanks for the help.

I noticed that there's an attribute named extra_files on the Convert class. Is that attribute intended for this purpose?

@ngkavin
Copy link
Contributor

ngkavin commented Oct 15, 2020

No. Some EK80 raw files contain broadband as well as continuous wave backscatter data which are not saved in the same NetCDF file. In this case, the extra_files are used for keeping track of the new '_cw.nc' files that are created in the conversion process.

I will probably rename the variable to cw files to make it clear that it is only used for this purpose.

@imranmaj
Copy link
Contributor

Thanks!

When I try to open the files using DataSet.open_mfdataset, I get the following error:

ValueError: Could not find any dimension coordinates to use to order the datasets for concatenation

I believe this is because internally, DataSet.open_mfdataset calls xarray.combine_by_coords, which says

If it cannot determine the order in which to concatenate the datasets, it will raise a ValueError

It looks like the obs dimension does not have a dimension coordinate. Would I be correct in assuming that I need to add a dimension coordinate to the obs dimension (probably with the preprocess keyword argument on open_mfdataset)?

@ngkavin
Copy link
Contributor

ngkavin commented Oct 17, 2020

I don't know what your GPS files look like, nor do I know what the obs dimension is. But usually you would want to concatenate on a time dimension. I don't think you would need to add dimensions because your files should already have the necessary dimensions so that it could be saved to a .nc file. Have you tried specifying combine='nested', concat_dim='name of time dimension' in open_mfdataset?

@leewujung
Copy link
Member Author

@ngkavin : I'll send you link to the files, it's in our shared drive.

@ngkavin
Copy link
Contributor

ngkavin commented Oct 17, 2020

xr.open_mfdataset(files, combine='nested', concat_dim='obs')

@ngkavin ngkavin closed this as completed Oct 17, 2020
@ngkavin ngkavin reopened this Oct 17, 2020
@ngkavin
Copy link
Contributor

ngkavin commented Oct 17, 2020

Accidently closed, but xr.open_mfdataset(files, combine='nested', concat_dim='obs') works.

@imranmaj
Copy link
Contributor

Ah, I see, thank you. I was trying to combine by_coords

@leewujung
Copy link
Member Author

This is supposed to closed long time ago. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants