-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modified annotate_sex() #427
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions, but it's very possible I'm not following because I haven't worked with the VDS format as long as you. Thanks for putting in this PR!
gnomad/sample_qc/pipeline.py
Outdated
) | ||
mtds = hl.vds.split_multi(mtds, filter_changed_loci=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add filter_changed_loci
as a parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, now that I look at it, do we need this? mt
is set to mtds.variant_data
, so
mt = mt.filter_rows(
(hl.len(mt.alleles) == 2) & hl.is_snp(mt.alleles[0], mt.alleles[1])
)
should work on that mt
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is right, they work equivalently on mtds.variant_data
. Sorry about that. And in this case, do we want to add a warning or something about using filter_chaged_loci
before loading the data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think it's fine without it, it's outside the scope of the function and the MT doesn't even need to be split before using it
gnomad/sample_qc/pipeline.py
Outdated
mt, [hl.parse_locus_interval(contig) for contig in x_contigs] | ||
) | ||
|
||
if is_vds: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm not fully following this. For a VDS at this point mt
is just the split biallelic variant data MT so filter_intervals
should work fine on it right? And this only keeps the mtds.variant_data
gnomad/sample_qc/pipeline.py
Outdated
@@ -282,6 +310,9 @@ def annotate_sex( | |||
logger.info("Filtering to provided sites") | |||
mt = mt.annotate_rows(**sites_ht[mt.row_key]) | |||
mt = mt.filter_rows(hl.is_defined(mt[aaf_expr])) | |||
if is_vds: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? If only keeping mtds.variant_data
and not using mtds
after this, I don't think hl.vds.filter_variants
is needed
Co-authored-by: jkgoodrich <33063077+jkgoodrich@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is close to ready, just a couple more comments (minus the addition of hl.vds.impute_sex_chromosome_ploidy
to Hail for tests to pass)
gnomad/sample_qc/pipeline.py
Outdated
:param bool is_sparse: Whether input MatrixTable is in sparse data format | ||
:param excluded_intervals: Optional table of intervals to exclude from the computation. | ||
:param included_intervals: Optional table of intervals to use in the computation. REQUIRED for exomes. | ||
:param normalization_contig: Which chromosome to use to normalize sex chromosome coverage. Used in determining sex chromosome ploidies. | ||
:param reference_genome: Reference genome used for constructing interval list. Default: 'GRCh38' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should just infer this from the MT? example:
gnomad_methods/gnomad/utils/filtering.py
Line 133 in 978b577
build = get_reference_genome(mt.locus).name |
Co-authored-by: jkgoodrich <33063077+jkgoodrich@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
three more tiny suggestions, and it looks like hl.vds.impute_sex_chromosome_ploidy
might now be in Hail, so a recommit should hopefully fix that error
Co-authored-by: jkgoodrich <33063077+jkgoodrich@users.noreply.github.com>
Co-authored-by: jkgoodrich <33063077+jkgoodrich@users.noreply.github.com>
Co-authored-by: jkgoodrich <33063077+jkgoodrich@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.