-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize compute_MVBS
and improve its efficiency for Dask input
#878
Conversation
… add get_MVBS_along_channels, which computes MVBS for each channel
…bins and means ping_time
…cho_range values for each ping time
…ock sv values, and test general method against mock values
… for MVBS testing
Codecov Report
@@ Coverage Diff @@
## dev #878 +/- ##
===========================================
+ Coverage 82.21% 94.73% +12.51%
===========================================
Files 53 1 -52
Lines 5022 38 -4984
===========================================
- Hits 4129 36 -4093
+ Misses 893 2 -891
Flags with carried forward coverage won't be shown. Click here to find out more. 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
…old way of assuming echo_range is the same throughout ping_time
…thod by changing the bins produced by the resample method
…em, and then constructs the appropriate er_means array
…uping echo_range with respect to ping_time is needed
…to bin_and_mean_2d
…pute_MVBS methods only
@leewujung this PR is ready for a review. As we discussed yesterday, there are a couple of functions where docstrings/comments are needed. I have tried my best to mark these with TODO statements. I have decided not to incorporate the toggle for comprehensive/less comprehensive checks for After further discussion with @leewujung, we have decided to move the last bullet point above (Possibly replace |
…red for compute_MVBS
…nning, and calculating the mean of an array with respect to echo_range
@leewujung I have went through and documented/cleaned up the code I previously mentioned. This code is now in a state that I am happy with, please go ahead and review it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@b-reyes : I am done with my first round of review for now. I recommended some refactoring that should be trivial to do. I have some questions on why for loops instead of delayed functions are used in a couple of places when binned averages are computed. Is there a performance concern here if the bins are too small (i.e., contain too little number of values?). If that's case, I am also wondering about delaying bin_and_mean_2d
for each channel instead.
I have not reviewed the tests and also the content of the 2 is_grouping_needed_*
functions, but I can do that in the next round. Wanted to return this to you first so that I don't hold up the process. Thanks.
Ok @b-reyes and I resolved the main comments above and have the 2 items:
|
Co-authored-by: Wu-Jung Lee <leewujung@gmail.com>
…cho_range is needed
@leewujung I have responded to or addressed your comments. This PR is ready for another round of review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@b-reyes : Everything looks good! Once you added/changed the other two thing’s in the comments, please feel free to merge this. Great work!!
…ho_range to group_dig_er_bin_mean_echo_range
@leewujung I have addressed those comments. It looks like everything is in order here and the tests are passing. I will go ahead and merge it in shortly. |
…SOceanAcoustics#878) * establish core function for binning and computing the mean of a 2D array * finish structure of bin_and_mean_2d and create simple tests for it * modify bin_and_mean_2d so that it produces the appropriate values and add get_MVBS_along_channels, which computes MVBS for each channel * investigate more efficient dask graph * investigate chunking of MVBS computation * add new approach that computes mean binned echorange values and then bins and means ping_time * remove commented out lines and add a function for bin and meaning the times * start creating a bin and mean 2d method that accounts for different echo_range values for each ping time * replace previous bin and mean method with general method, construct mock sv values, and test general method against mock values * document and clean up code associated with creating the mock data set for MVBS testing * allow Sv and echo_range arrays to be dask arrays in mock data * add pytest fixture to test_bin_and_mean_2d * begin documenting all functions needed to compute MVBS along a channel * finish documenting get_MVBS_along_channels function * incorporate new bin and mean method into compute_MVBS and go back to old way of assuming echo_range is the same throughout ping_time * make the MVBS output values of new method be equivalent to the old method by changing the bins produced by the resample method * add code section that groups different echo_range values, computes them, and then constructs the appropriate er_means array * add two functions (less and more comprehensive) that determine if grouping echo_range with respect to ping_time is needed * modify get_unequal_rows to account for dask input and add bool input to bin_and_mean_2d * add seed to test_bin_and_mean_2d and randomly choose a ping bin to be completely NaN * remove old functions associated with compute_MVBS and use the new compute_MVBS methods only * comment and add docstrings to previously undocumented functions required for compute_MVBS * clean up bin_and_mean_2d code by creating a function for grouping, binning, and calculating the mean of an array with respect to echo_range * catch warnings associated with taking the nanmean of an array filled with NaNs * provide more specific docstring for arr in bin_and_mean_echo_range Co-authored-by: Wu-Jung Lee <leewujung@gmail.com> * refactor and rename code associated with determining if grouping of echo_range is needed * move all core routines needed for compute_MVBS into the python script mvbs.py * add note to docstring of bin_and_mean_2d and rename group_bin_mean_echo_range to group_dig_er_bin_mean_echo_range Co-authored-by: Wu-Jung Lee <leewujung@gmail.com>
This PR addresses #662 and #845 by directly calculating
MVBS
in an efficient manner. This new approach allows us to applycompute_MVBS
onecho_range
data sizes that vary withping_time
andSv
values that are Dask arrays. In addition to the new core routines that computeMVBS
, unit tests for the calculation ofMVBS
were also completed. These tests construct mockSv
values with knownMVBS
values.Please note that this PR is currently a draft. The following items need to be completed:
get_MVBS_along_channels
inpreprocess/api.py
ds_MVBS
calculation incompute_MVBS
with the new and improvedget_MVBS_along_channels
functionecho_range
must be the same for eachping_time
compute_MVBS
function works well with the 19 files in the OOI eclipse notebookPossibly replacetest_compute_MVBS
with a more general test using the ideas intest_bin_and_mean_2d