Suggestions on PR: Calculate per callset stats on gnomAD v4.1 per sample variant counts #614

jkgoodrich · 2024-05-10T16:21:36Z

Relies on broadinstitute/gnomad_methods#701

This reorganizes the code to move building most of the filtering expressions in gnomad_methods. There might be more that we can move over to gnomad_methods, but I think we should get these suggestions merged into the main PR first.

I added some more combinations and a general filter for most of the groups to include pass filters and capture filters, but I would like to discuss this more to see what is best for us to do.

I added metadata similar to what we do in frequencies because I think it will be easier to filter to the specific grouping that is wanted.

I also switched to doing the aggregate over an array of the filter groups to prevent a class too large error, and I think also a nice speed-up

- Make sure to filter to pass and capture variants for most filter options (need to discuss this more with the team)

matren395 · 2024-05-13T15:50:54Z

makes sense enough, I think, but let me take another look at it in a bit.

and this isn't supposed to add functionality for finding whole callset stats, yeah? just reorganizing and filter changes yeah ?

jkgoodrich · 2024-05-13T16:44:00Z

and this isn't supposed to add functionality for finding whole callset stats, yeah? just reorganizing and filter changes yeah ?

Yeah, I haven't gotten to that PR yet, I will add my thoughts for the callset-wide stats to that PR

matren395

makes sense enough to me - left some comments where some more documentation is needed, and I might wanna try and run this all in a notebook tomorrow

gnomad_qc/v4/assessment/calculate_per_sample_stats.py

KoalaQin

I'm sending my first round back for now.

gnomad_qc/v4/assessment/calculate_per_sample_stats.py

Co-authored-by: Qin He <44242118+KoalaQin@users.noreply.github.com> Co-authored-by: Daniel Marten <78616802+matren395@users.noreply.github.com>

Change to use the metadata combinations to determine grouping instead of doing it separately Don't convert the summary stats array to rows, just keep as an array with associated metadata like we do for frequency

…n all possible filter groups was not requested

KoalaQin

I sent these back so far.

gnomad_qc/v4/assessment/calculate_per_sample_stats.py

KoalaQin

I haven't finished the whole thing, now at the getting meta function.

gnomad_qc/v4/assessment/calculate_per_sample_stats.py

…lized for gnomad_methods

gnomad_qc/v4/assessment/calculate_per_sample_stats.py

KoalaQin

Yes, it makes sense. I think we can move the first two functions to gnomad_methods, with a bit change on the docstring.

gnomad_qc/v4/assessment/calculate_per_sample_stats.py

KoalaQin · 2024-05-24T13:30:05Z

gnomad_qc/v4/assessment/calculate_per_sample_stats.py

+        - The `filter_group_key_rename` parameter can be used to rename keys in the
+          `all_sum_stat_filters`, `common_combo_override`, or `lof_combo_override`
+          after creating all combinations.


I think this should be:

Suggested change

- The `filter_group_key_rename` parameter can be used to rename keys in the

`all_sum_stat_filters`, `common_combo_override`, or `lof_combo_override`

after creating all combinations.

- The `filter_group_key_rename` parameter can be used to rename keys in the

generated filter combinations to a different set of keys.

no, it renames any key including in the override dictionaries

KoalaQin

A few comments, we're very close.

KoalaQin · 2024-05-24T13:49:21Z

gnomad_qc/v4/assessment/calculate_per_sample_stats.py

+    common_filter_combos: List[List[str]] = None,
+    common_combo_override: Dict[str, List[str]] = None,


What do you think of renaming these parameters? I found common_combo cute but it's not very clear.
Maybe like this:

summary_stat_filters: Dict[str, List[str]], common_filter_combinations: List[List[str]] = None, common_filter_overrides: Dict[str, List[str]] = None, lof_filter_combinations: Optional[List[List[str]]] = None, lof_filter_overrides: Dict[str, List[str]] = None, filter_key_renames: Dict[str, str] = None,

I am keeping combinations as combos, it's a common way to shorten combinations similar to how we shorten frequency to freq, and we use "combo" or "combos" in other areas of our code to keep names shorter. I changed to use filter in the override names

gnomad_qc/v4/assessment/calculate_per_sample_stats.py

KoalaQin · 2024-05-24T15:46:30Z

gnomad_qc/v4/assessment/calculate_per_sample_stats.py

+            # filter expression from a struct.
+            f_expr = filter_exprs.get(f"{k}_{v}")


Can you give me an example for f_struct? I get it for loftee_no_flags as f_expr.
Would f_struct for something like maf_af?

yes, if you look at the gnomad_methods code, it can return a BooleanExpression or a StuctExpression of BooleanExpressions, max_af, csq_set,...

Co-authored-by: Qin He <44242118+KoalaQin@users.noreply.github.com>

…/github.com/broadinstitute/gnomad_qc into jg/move_some_stats_functionality_to_methods

…_methods

KoalaQin

LGTM!

jkgoodrich added 6 commits May 9, 2024 08:23

Move some of the filter expression functionality to gnomad_methods

98e7cd1

Changes needed while testing

61e8638

Convert filtering group to array to avoid class too large error

10b16bb

Clean-up main a bit

4a74c99

Clean-up main a bit

f2180eb

- Add some clear metadata for the filter groups for easier parsing.

369b1f5

- Make sure to filter to pass and capture variants for most filter options (need to discuss this more with the team)

jkgoodrich added the v4.1 label May 10, 2024

jkgoodrich requested review from KoalaQin and matren395 May 10, 2024 16:21

jkgoodrich assigned jkgoodrich, KoalaQin and matren395 May 10, 2024

jkgoodrich changed the base branch from main to dm/per_sample_counts_4_1 May 10, 2024 16:21

jkgoodrich added the Release Stats label May 10, 2024

jkgoodrich added 2 commits May 10, 2024 10:43

Annotate globals onto the output HT

42675ff

Add extra common filter group

4d9312d

Add extra common filter group

2c3944e

matren395 reviewed May 16, 2024

View reviewed changes

matren395 reviewed May 17, 2024

View reviewed changes

gnomad_qc/v4/assessment/calculate_per_sample_stats.py Outdated Show resolved Hide resolved

KoalaQin requested changes May 20, 2024

View reviewed changes

jkgoodrich and others added 5 commits May 20, 2024 14:30

Add more detained doc string to get_summary_stats_filter_groups_ht

d44d6d0

Apply suggestions from code review

8b529b8

Co-authored-by: Qin He <44242118+KoalaQin@users.noreply.github.com> Co-authored-by: Daniel Marten <78616802+matren395@users.noreply.github.com>

Only define COMMON_FILTER_COMBOS once

1e7b5f7

Don't include keys in the naming if they have csq in the name

4c0d684

Add clear comments and documentation to get_filter_group_meta

374c3fb

jkgoodrich requested review from matren395 and KoalaQin May 20, 2024 23:01

Make various changes to names of filter groups and filter group meta

4ccf6a9

Change to use the metadata combinations to determine grouping instead of doing it separately Don't convert the summary stats array to rows, just keep as an array with associated metadata like we do for frequency

jkgoodrich added 5 commits May 22, 2024 08:09

remove unused imports

61b010d

Update comments and doctrings, primarily for get_filter_group_meta

043d5e1

Update docstring in get_summary_stats_filter_groups_ht

295a6c3

Update compute_agg_sample_stats to work with the new input format

47787f5

loftee_labels -> loftee_label and add logger warn if a filter group i…

7a618f2

…n all possible filter groups was not requested

KoalaQin requested changes May 23, 2024

View reviewed changes

gnomad_qc/v4/assessment/calculate_per_sample_stats.py Outdated Show resolved Hide resolved

gnomad_qc/v4/assessment/calculate_per_sample_stats.py Outdated Show resolved Hide resolved

jkgoodrich added 2 commits May 23, 2024 11:29

Fix example in get_filter_group_meta

b2ac083

Update globals to create missing loftee combinations

a365857

jkgoodrich requested a review from KoalaQin May 23, 2024 18:17

jkgoodrich added 3 commits May 23, 2024 12:33

Move loftee flags to only LOF_FILTERS_FOR_COMBO

3def23a

Move loftee label to only LOF_FILTERS_FOR_COMBO

629dc93

loftee_HC -> loftee_label

a33e653

KoalaQin requested changes May 23, 2024

View reviewed changes

jkgoodrich added 2 commits May 23, 2024 15:47

Add more options to the get_filter_group_meta docstring

4fb0fd6

Make parameters optional in get_filter_group_meta so it's more genera…

c7be055

…lized for gnomad_methods

jkgoodrich commented May 23, 2024

View reviewed changes

gnomad_qc/v4/assessment/calculate_per_sample_stats.py Outdated Show resolved Hide resolved

jkgoodrich requested a review from KoalaQin May 23, 2024 22:26

KoalaQin requested changes May 24, 2024

View reviewed changes

Update comment

a595372

jkgoodrich requested a review from KoalaQin May 24, 2024 15:34

KoalaQin requested changes May 24, 2024

View reviewed changes

jkgoodrich and others added 3 commits May 24, 2024 09:57

Apply suggestions from code review

643c03a

Co-authored-by: Qin He <44242118+KoalaQin@users.noreply.github.com>

Rename arguments in get_filter_group_meta

1ab0200

Merge branch 'jg/move_some_stats_functionality_to_methods' of https:/…

6f9c331

…/github.com/broadinstitute/gnomad_qc into jg/move_some_stats_functionality_to_methods

jkgoodrich requested a review from KoalaQin May 24, 2024 16:14

Move generate_filter_combinations and get_filter_group_meta to gnomad…

8acafb4

…_methods

KoalaQin approved these changes May 24, 2024

View reviewed changes

jkgoodrich merged commit afed864 into dm/per_sample_counts_4_1 May 24, 2024
2 checks passed

jkgoodrich deleted the jg/move_some_stats_functionality_to_methods branch May 24, 2024 17:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions on PR: Calculate per callset stats on gnomAD v4.1 per sample variant counts #614

Suggestions on PR: Calculate per callset stats on gnomAD v4.1 per sample variant counts #614

jkgoodrich commented May 10, 2024

matren395 commented May 13, 2024

jkgoodrich commented May 13, 2024

matren395 left a comment

KoalaQin left a comment

KoalaQin left a comment

KoalaQin left a comment

KoalaQin left a comment

KoalaQin May 24, 2024

jkgoodrich May 24, 2024

KoalaQin left a comment

KoalaQin May 24, 2024

jkgoodrich May 24, 2024

KoalaQin May 24, 2024

jkgoodrich May 24, 2024

KoalaQin left a comment

		common_filter_combos: List[List[str]] = None,
		common_combo_override: Dict[str, List[str]] = None,

		# filter expression from a struct.
		f_expr = filter_exprs.get(f"{k}_{v}")

Suggestions on PR: Calculate per callset stats on gnomAD v4.1 per sample variant counts #614

Suggestions on PR: Calculate per callset stats on gnomAD v4.1 per sample variant counts #614

Conversation

jkgoodrich commented May 10, 2024

matren395 commented May 13, 2024

jkgoodrich commented May 13, 2024

matren395 left a comment

Choose a reason for hiding this comment

KoalaQin left a comment

Choose a reason for hiding this comment

KoalaQin left a comment

Choose a reason for hiding this comment

KoalaQin left a comment

Choose a reason for hiding this comment

KoalaQin left a comment

Choose a reason for hiding this comment

KoalaQin May 24, 2024

Choose a reason for hiding this comment

jkgoodrich May 24, 2024

Choose a reason for hiding this comment

KoalaQin left a comment

Choose a reason for hiding this comment

KoalaQin May 24, 2024

Choose a reason for hiding this comment

jkgoodrich May 24, 2024

Choose a reason for hiding this comment

KoalaQin May 24, 2024

Choose a reason for hiding this comment

jkgoodrich May 24, 2024

Choose a reason for hiding this comment

KoalaQin left a comment

Choose a reason for hiding this comment