-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestions on PR: Calculate per callset stats on gnomAD v4.1 per sample variant counts #614
Suggestions on PR: Calculate per callset stats on gnomAD v4.1 per sample variant counts #614
Conversation
- Make sure to filter to pass and capture variants for most filter options (need to discuss this more with the team)
makes sense enough, I think, but let me take another look at it in a bit. and this isn't supposed to add functionality for finding whole callset stats, yeah? just reorganizing and filter changes yeah ? |
Yeah, I haven't gotten to that PR yet, I will add my thoughts for the callset-wide stats to that PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense enough to me - left some comments where some more documentation is needed, and I might wanna try and run this all in a notebook tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sending my first round back for now.
Co-authored-by: Qin He <44242118+KoalaQin@users.noreply.github.com> Co-authored-by: Daniel Marten <78616802+matren395@users.noreply.github.com>
Change to use the metadata combinations to determine grouping instead of doing it separately Don't convert the summary stats array to rows, just keep as an array with associated metadata like we do for frequency
…n all possible filter groups was not requested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sent these back so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't finished the whole thing, now at the getting meta function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it makes sense. I think we can move the first two functions to gnomad_methods, with a bit change on the docstring.
- The `filter_group_key_rename` parameter can be used to rename keys in the | ||
`all_sum_stat_filters`, `common_combo_override`, or `lof_combo_override` | ||
after creating all combinations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be:
- The `filter_group_key_rename` parameter can be used to rename keys in the | |
`all_sum_stat_filters`, `common_combo_override`, or `lof_combo_override` | |
after creating all combinations. | |
- The `filter_group_key_rename` parameter can be used to rename keys in the | |
generated filter combinations to a different set of keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it renames any key including in the override dictionaries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, we're very close.
common_filter_combos: List[List[str]] = None, | ||
common_combo_override: Dict[str, List[str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of renaming these parameters? I found common_combo cute but it's not very clear.
Maybe like this:
summary_stat_filters: Dict[str, List[str]],
common_filter_combinations: List[List[str]] = None,
common_filter_overrides: Dict[str, List[str]] = None,
lof_filter_combinations: Optional[List[List[str]]] = None,
lof_filter_overrides: Dict[str, List[str]] = None,
filter_key_renames: Dict[str, str] = None,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am keeping combinations as combos, it's a common way to shorten combinations similar to how we shorten frequency to freq, and we use "combo" or "combos" in other areas of our code to keep names shorter. I changed to use filter in the override names
# filter expression from a struct. | ||
f_expr = filter_exprs.get(f"{k}_{v}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give me an example for f_struct? I get it for loftee_no_flags
as f_expr.
Would f_struct for something like maf_af?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, if you look at the gnomad_methods code, it can return a BooleanExpression or a StuctExpression of BooleanExpressions, max_af, csq_set,...
Co-authored-by: Qin He <44242118+KoalaQin@users.noreply.github.com>
…/github.com/broadinstitute/gnomad_qc into jg/move_some_stats_functionality_to_methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Relies on broadinstitute/gnomad_methods#701
This reorganizes the code to move building most of the filtering expressions in gnomad_methods. There might be more that we can move over to gnomad_methods, but I think we should get these suggestions merged into the main PR first.
I added some more combinations and a general filter for most of the groups to include pass filters and capture filters, but I would like to discuss this more to see what is best for us to do.
I added metadata similar to what we do in frequencies because I think it will be easier to filter to the specific grouping that is wanted.
I also switched to doing the aggregate over an array of the filter groups to prevent a class too large error, and I think also a nice speed-up