-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convenience selection function for sub-parts of NCDataset #47
Comments
Notice that this |
It is great to see your enthusiasm to improve this package! Your help is very welcomed. Maybe we could have a using IntervalSets, NCDatasets
ds = NCDataset("ECMWF_ERA-40_subset.nc")
ncv = ds["tp"]
select(ncv, time = Date(2000,1,1)..Date(2001,12,31))
# returns a CFVariable
select(ds, time = Date(2000,1,1)..Date(2001,12,31))
# returns a NCDataset with all variable with a time dimension sliced, and variable without a time dimensions are not sliced I played a bit with xarray, and I noticed that even indexing the import xarray as xr
# file from https://www.unidata.ucar.edu/software/netcdf/examples/ECMWF_ERA-40_subset.nc
ds = xr.open_dataset("ECMWF_ERA-40_subset.nc")
a = ds["tp"][:,:,1] # a is still a xarray.core.dataarray.DataArray
b = ds["tp"][:,:,1].values # now we have the actual data Maybe it would also be useful to have a function which works on indices and only virtually subsets the data, a bit like the julia function view(ncv,:,:,1)
# also returns a CFVariable with the first time instance
view(ds,time = 1:3)
# returns a NCDataset with the first 3 time instances While the arguments for
|
Exactly, and I find this a neat feature. It helps a lot to not have to constantly remember which dimension of the dataset is which. Also, the pretty printing for
Oh wow, this is really a good suggestion! So to get it right, you are suggesting that two syntaxes should exist when accessing a
I think this is a good idea! Here is the thing to really consider: which of these two things do you think is better to have the "short" syntax, i.e. I now also realize, that having special syntax for getting only the numerical values as an |
Yes, this is exactly right.
I see your point of not changing the type by indexing, but changing the behavior of But in fact, that are some difference between numpy and julia that, to me, justify a different approach of NCDatasets and xarray. In fact, in numpy indexing creates a view: import numpy as np
a = np.array([1,2,3,4])
b = a[:]
b[1] = 10
# returns array([ 1, 10, 3, 4]) While in Julia, indexing copies the data; for NCDatasets you can think of copying the data from the disk to memory. In Julia, |
Okay, this is the convincing sentence for me! Therefore let's move forward with your approach and define |
Woohooo I got a "brilliant" idea for convenience syntax that retains the v[:, :, 1:5] # copies the data, returns Array
v(:, :, 1:5) # views the data, returns CFVariable This would allow short syntax for both operations! (I know how to implement the parenthesis syntax once you guide me on how to actually do it with respect to the source code. The first steps would be for you to tell me which parts of the source to read in detail and I'll try to take it from there. |
I think I would start to implement a view array type for a sub-array which does not fetch the element individually. Maybe the SubArray can be used for this purpose? Then a As you have seen, A significant addition to the code was the support of multi-files (and in particular avoiding the opening of all files at the same time). These are the types Concerning the |
Okay! I'll start working on
Why would you have this at an extension package and not here? (as far as Julia is concerned, yes you can, this is just a method definition, but why tho) Kind of unrelated with this issue, but the above question reminded me to ask you again, cf. discussion in JuliaClimate/ClimateTools.jl#65 , but such things are good to be clear early on: are you willing to move NCDatasets.jl in an organization (in that discussion JuliaClimate) and allow other people to also have maintainer status over it? I have been involved in long discussions with many people in Geo/Climate fields in Julia, discussing the many benefits of having packages in organizations (along with the difficulties), if you are really interested to read about it, these two discourse posts are relevant: https://discourse.julialang.org/t/newcomer-contributor-in-juliageo-and-co-help-me-get-started/32480 and https://discourse.julialang.org/t/how-can-we-create-a-leaner-ecosystem-for-julia/32904/25?u=datseris onwards. I've tried to point out the many benefits of why this is a good thing to do. |
I am a bit confused about the separation of Regarding #51 a solution could be to make both |
Actually, is there even a way to create / get a |
You can get a typeof(variable(NCDataset("WOD-Salinity-Provencal.nc"),"Salinity"))
# returns NCDatasets.Variable{Float32,1} It is a good idea to have an abstract type |
Concerning moving |
Views are working fine for
|
This is solved in e.g. ClimateBase.jl where an automatic convertion to a |
Hi there, I have a feature request which could be helpful, albeit it is a small convenience syntax.
Let's say I have loaded an NCDataset. In Python this is done with
xarray
, but here we have something likeNow, let's say we have a field of this dataset,
fld = ebaf41_toa["toa_sw_all_mon"]
:In Python, you would do something like
This
xarray
offers a convenience syntax likewhich means that you can select sub-parts of the field by specifying which ranges of the dependent variables to keep.
This could be implemented here as well, but one has to somehow map the given keywords to "which" dependent variables they represent. I am happy to make this contribution if you are willing to guide me.
The text was updated successfully, but these errors were encountered: