Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize echopype software provenance and apply to Sv and MVBS #621

Merged
merged 12 commits into from
May 5, 2022

Conversation

emiliom
Copy link
Collaborator

@emiliom emiliom commented Apr 10, 2022

Addresses #618. Let's keep it as a draft PR for now, for discussion.

I've abstracted the assignment of the current Provenance software attributes into a function in the new module utils.prov.py. This function now handles both conversion_* and processing_* attributes in a single place. Also eliminated existing duplication of attribute assignments.

Note the additional (not from the convention) global attribute I've added as a suggestion for Sv and MVBS: processing_function. It would allow us to record the actual function used.

Doesn't address the move from src_filenames global attribute to a source_filenames variable (#620)

@emiliom emiliom added the data format Anything about data format label Apr 10, 2022
@emiliom emiliom added this to the 0.6.0 milestone Apr 10, 2022
@emiliom emiliom requested a review from leewujung April 10, 2022 21:49
@emiliom emiliom self-assigned this Apr 10, 2022
@emiliom emiliom marked this pull request as draft April 10, 2022 21:49
@codecov-commenter
Copy link

codecov-commenter commented Apr 10, 2022

Codecov Report

Merging #621 (c1ebfb3) into dev (6bec3e4) will decrease coverage by 9.17%.
The diff coverage is 65.21%.

@@            Coverage Diff             @@
##              dev     #621      +/-   ##
==========================================
- Coverage   78.75%   69.57%   -9.18%     
==========================================
  Files          42       37       -5     
  Lines        3784     3744      -40     
==========================================
- Hits         2980     2605     -375     
- Misses        804     1139     +335     
Flag Coverage Δ
unittests 69.57% <65.21%> (-9.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
echopype/echodata/combine.py 63.55% <0.00%> (-10.40%) ⬇️
echopype/utils/prov.py 0.00% <0.00%> (ø)
echopype/calibrate/api.py 60.41% <63.63%> (-23.37%) ⬇️
echopype/preprocess/api.py 74.19% <92.30%> (-16.01%) ⬇️
echopype/convert/set_groups_base.py 88.04% <100.00%> (-2.28%) ⬇️
echopype/convert/set_groups_ek60.py 91.58% <100.00%> (ø)
echopype/preprocess/noise_est.py 73.68% <100.00%> (-20.61%) ⬇️
echopype/utils/coding.py 0.00% <0.00%> (-100.00%) ⬇️
echopype/echodata/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
echopype/calibrate/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
... and 25 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

}
if source_files:
# TODO: src_filenames will be replaced with a new variable, source_filenames
# Also, come to think of it, source files is not "echopype provenance" info per se
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that source_files is part of the convention and not from echopype? Just wanted to make sure I understand this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant with "source files is not "echopype provenance" info per se" was that all the other attributes I've included in echopype_prov_attrs are about the echopype software itself, but source files is not. Though, come to think of it, strictly speaking, conversion_time is not either .... Hmm. We could rename echopype_prov_attrs to something more general. Its scope is the software used (name & version) and basic info about its application (timestamp when it was run, and input files used -- though the latter will be moved to a variable and will no longer be an attribute)

Copy link
Member

@leewujung leewujung May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again and seeing how processing_function is currently separately done in calibrate.api._compute_cal, I wonder if we could rename this to echopype_ver_time_files_prov so that it is clear what this function does, for when we come back to provenance in the future.

Copy link
Collaborator Author

@emiliom emiliom May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, sounds good. It sounds like you're also saying that we keep the source_files provenance functionality in this function, even though it'll be changed to add xarray variables rather than an attribute. Correct? I haven't thought through the consequences of doing that for this function, given that echopype_prov_attrs currently just populates a dict and returns it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like you're also saying that we keep the source_files provenance functionality in this function, even though it'll be changed to add xarray variables rather than an attribute. Correct?

I didn't think that far... ha. I think you are right that since it will be a data variable not an attribute, it should not be here. In that case, the function name would become echopype_ver_time_prov.

@emiliom
Copy link
Collaborator Author

emiliom commented Apr 21, 2022

Note the additional (not from the convention) global attribute I've added as a suggestion for Sv and MVBS: processing_function. It would allow us to record the actual function used.

@leewujung this is what I was referring to.

@emiliom
Copy link
Collaborator Author

emiliom commented Apr 21, 2022

TODO: Add software provenance attributes to remove_noise output dataset. I'll add a commit to this PR.

@emiliom
Copy link
Collaborator Author

emiliom commented Apr 28, 2022

TODO: For EK80, add waveform_mode and encode_mode parameters passed to compute_Sv and compute_TS (and pass them to MVBS too?). These parameters are not used in any other sensors at this time, so no attributes will be added.

@leewujung
Copy link
Member

Yes please pass these to MVBS also, as part of the chain of provenance (is there a specific term for this chain?).

@emiliom
Copy link
Collaborator Author

emiliom commented May 1, 2022

@leewujung I've pushed several commits that include multiple improvements, some of which are not about provenance per se.

I decided not to try to implement "provenance chain", where, say, parameters used in computing Sv are passed downstream to MVBS. I think that's a larger topic. Let's revisit it after 0.6.0 is out, at some point when we focus on provenance again.

@emiliom emiliom marked this pull request as ready for review May 2, 2022 17:39
Copy link
Member

@leewujung leewujung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a change to remove the unused datetime import since GH doesn't allow me to comment on unchanged code.

I added a few suggestions, but otherwise this looks ready to go!

FYI- I pondered a bit about the long_name you added for Sv_noise and Sv_corrected, but decided that what you have is appropriate.

@emiliom
Copy link
Collaborator Author

emiliom commented May 5, 2022

I pushed a change to remove the unused datetime import since GH doesn't allow me to comment on unchanged code.

Thanks!

FYI- I pondered a bit about the long_name you added for Sv_noise and Sv_corrected, but decided that what you have is appropriate.

Great. And yeah, I struggled a bit with that.

I added a few suggestions, but otherwise this looks ready to go!

Great! But since I've made some follow-up comments to your inline comments, I'll give you more time to react. Let's plan to merge today, if possible.

@leewujung
Copy link
Member

leewujung commented May 5, 2022

I think I reacted to everything, haha. I can't wait to see this merged!! 😎

@emiliom
Copy link
Collaborator Author

emiliom commented May 5, 2022

Thank you!! Merging now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data format Anything about data format
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants