-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix multiple problems with mishandling of missing entries in compute_… #242
Conversation
…coverage_stats in sparse_mt.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a small question about the docstring, but otherwise, this looks good to me. thank you for the detective work!!
gnomad/utils/sparse_mt.py
Outdated
@@ -654,7 +654,7 @@ def compute_coverage_stats( | |||
mt = mt.select_entries("END", "DP") | |||
col_key_fields = list(mt.col_key) | |||
t = mt._localize_entries("__entries", "__cols") | |||
t = t.join(reference_ht.annotate(_in_ref=True), how="outer") | |||
t = t.join(reference_ht.select(_in_ref=True).key_by("locus"), how="outer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment about the reference HT still relevant in the docstring?
It needs to be keyed with the same keys as `mt`, typically either `locus` or `locus, alleles`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! fixed.
@@ -669,14 +669,17 @@ def compute_coverage_stats( | |||
# Filter rows where the reference is missing | |||
mt = mt.filter_rows(mt._in_ref) | |||
|
|||
# Unfilter entries so that entries with no ref block overlap aren't null | |||
mt = mt.unfilter_entries() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh, great catch!! I didn't realize you needed this and tested it out in a notebook to confirm.
…tation of coverage from sparse mt
@ch-kr , I know you already approved it but I had to make a few changes for this to run (compute efficiency) and added the new version of coverage to the resources so if you don't mind it'd be great if you could take a quick look at that last commit! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny question, but this looks good to me
@@ -2,6 +2,7 @@ | |||
|
|||
## Unreleased | |||
|
|||
* Fixed handling of missing entries (not within a ref block / alt site) when computing `coverage_stats` in `sparse_mt.py` [[#242]] (/~https://github.com/broadinstitute/gnomad_methods/pull/242) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be helpful to include a note about the resource changes here too
gnomad/utils/reference_genome.py
Outdated
_context = hl.utils.range_table( | ||
ref.contig_length(contig), | ||
n_partitions=int(ref.contig_length(contig) / 5000000), | ||
n_partitions=n_partitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this creating tables with 0 partitions before you added the line of code with max
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes exactly! I got into troubles when I forgot to remove all the silly GL*
contigs
…coverage_stats in sparse_mt.py