Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic constraint functions: get_downsamplings (), remove_coverage_outliers (), and filter_for_mu() #507

Merged
merged 15 commits into from
Apr 26, 2023

Conversation

averywpx
Copy link
Contributor

@averywpx averywpx commented Dec 1, 2022

No description provided.

@averywpx averywpx requested a review from klaricch December 1, 2022 16:55
@averywpx averywpx self-assigned this Dec 1, 2022
@jkgoodrich jkgoodrich requested review from jkgoodrich and removed request for klaricch December 1, 2022 18:23
@jkgoodrich jkgoodrich self-assigned this Dec 1, 2022
gnomad/utils/filtering.py Outdated Show resolved Hide resolved
gnomad/utils/constraint.py Outdated Show resolved Hide resolved
gnomad/utils/filtering.py Show resolved Hide resolved
gnomad/utils/filtering.py Outdated Show resolved Hide resolved
gnomad/utils/filtering.py Outdated Show resolved Hide resolved
gnomad/utils/filtering.py Outdated Show resolved Hide resolved
Comment on lines 461 to 466
```
gerp_data = ht.aggregate(gerp=hl.agg.hist(context_ht.gerp, -12.3, 6.17, 100))
cumulative_data = np.cumsum(summary_hist.gerp.bin_freq) + summary_hist.gerp.n_smaller
np.append(cumulative_data, [cumulative_data[-1] + summary_hist.gerp.n_larger])
list(zip(summary_hist.gerp.bin_edges, cumulative_data / max(cumulative_data)))
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be in a function of some sort?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to constraint repo but still need to look into this more closely

@klaricch klaricch requested a review from jkgoodrich April 20, 2023 15:56
@jkgoodrich jkgoodrich force-pushed the constraint_mutation_rate_table branch from 6883590 to 8d00de2 Compare April 24, 2023 17:14
`freq_meta_expr`. Default is 'adj'.
"""
indices = hl.enumerate(freq_meta_expr).filter(
lambda f: (f[1].size() == 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment for why 3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, this doesn't seem to be needed, anything filtered to 'downsampling' already has a dict size of 3

gnomad/utils/constraint.py Show resolved Hide resolved
gnomad/utils/filtering.py Outdated Show resolved Hide resolved
gnomad/utils/filtering.py Outdated Show resolved Hide resolved
gnomad/utils/filtering.py Outdated Show resolved Hide resolved
gnomad/utils/filtering.py Show resolved Hide resolved
gnomad/utils/filtering.py Outdated Show resolved Hide resolved
gnomad/utils/filtering.py Outdated Show resolved Hide resolved
gnomad/utils/filtering.py Outdated Show resolved Hide resolved
gnomad/utils/filtering.py Outdated Show resolved Hide resolved
klaricch and others added 4 commits April 26, 2023 10:53
Co-authored-by: jkgoodrich <33063077+jkgoodrich@users.noreply.github.com>
Co-authored-by: jkgoodrich <33063077+jkgoodrich@users.noreply.github.com>
@klaricch klaricch requested a review from jkgoodrich April 26, 2023 16:57
`freq_meta_expr`. Default is 'adj'.
"""
indices = hl.enumerate(freq_meta_expr).filter(
lambda f: (f[1].get("group") == variant_quality)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming lambda f: (f[1].size() == 3) is not needed. There is no case where there will be something like: {'downsampling': '5000', 'group': 'adj', 'pop': 'global'} and {'downsampling': '5000', 'group': 'adj', 'pop': 'global', 'other_strata':'some_val} and we only want {'downsampling': '5000', 'group': 'adj', 'pop': 'global'}?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least not in the existing datasets, but I could leave it in to be on the safe size

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine for now and we can update if our downsampling frequencies get more complex

gnomad/utils/filtering.py Outdated Show resolved Hide resolved
gnomad/utils/filtering.py Outdated Show resolved Hide resolved
klaricch and others added 3 commits April 26, 2023 14:15
Co-authored-by: jkgoodrich <33063077+jkgoodrich@users.noreply.github.com>
Co-authored-by: jkgoodrich <33063077+jkgoodrich@users.noreply.github.com>
`freq_meta_expr`. Default is 'adj'.
"""
indices = hl.enumerate(freq_meta_expr).filter(
lambda f: (f[1].get("group") == variant_quality)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine for now and we can update if our downsampling frequencies get more complex

@klaricch klaricch requested a review from jkgoodrich April 26, 2023 18:39
@klaricch klaricch merged commit 6afcc6a into main Apr 26, 2023
@klaricch klaricch deleted the constraint_mutation_rate_table branch April 26, 2023 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants