Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified annotate_sex() #427

Merged
merged 11 commits into from
Jan 3, 2022
Merged

Modified annotate_sex() #427

merged 11 commits into from
Jan 3, 2022

Conversation

wlu04
Copy link
Member

@wlu04 wlu04 commented Dec 14, 2021

No description provided.

@wlu04 wlu04 requested a review from jkgoodrich December 14, 2021 22:09
Copy link
Contributor

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions, but it's very possible I'm not following because I haven't worked with the VDS format as long as you. Thanks for putting in this PR!

)
mtds = hl.vds.split_multi(mtds, filter_changed_loci=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add filter_changed_loci as a parameter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, now that I look at it, do we need this? mt is set to mtds.variant_data, so

        mt = mt.filter_rows(
            (hl.len(mt.alleles) == 2) & hl.is_snp(mt.alleles[0], mt.alleles[1])
        )

should work on that mt right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is right, they work equivalently on mtds.variant_data. Sorry about that. And in this case, do we want to add a warning or something about using filter_chaged_loci before loading the data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think it's fine without it, it's outside the scope of the function and the MT doesn't even need to be split before using it

gnomad/sample_qc/pipeline.py Outdated Show resolved Hide resolved
gnomad/sample_qc/pipeline.py Show resolved Hide resolved
mt, [hl.parse_locus_interval(contig) for contig in x_contigs]
)

if is_vds:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm not fully following this. For a VDS at this point mt is just the split biallelic variant data MT so filter_intervals should work fine on it right? And this only keeps the mtds.variant_data

@@ -282,6 +310,9 @@ def annotate_sex(
logger.info("Filtering to provided sites")
mt = mt.annotate_rows(**sites_ht[mt.row_key])
mt = mt.filter_rows(hl.is_defined(mt[aaf_expr]))
if is_vds:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? If only keeping mtds.variant_data and not using mtds after this, I don't think hl.vds.filter_variants is needed

wlu04 and others added 2 commits December 15, 2021 11:55
Co-authored-by: jkgoodrich <33063077+jkgoodrich@users.noreply.github.com>
Copy link
Contributor

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is close to ready, just a couple more comments (minus the addition of hl.vds.impute_sex_chromosome_ploidy to Hail for tests to pass)

:param bool is_sparse: Whether input MatrixTable is in sparse data format
:param excluded_intervals: Optional table of intervals to exclude from the computation.
:param included_intervals: Optional table of intervals to use in the computation. REQUIRED for exomes.
:param normalization_contig: Which chromosome to use to normalize sex chromosome coverage. Used in determining sex chromosome ploidies.
:param reference_genome: Reference genome used for constructing interval list. Default: 'GRCh38'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just infer this from the MT? example:

build = get_reference_genome(mt.locus).name

gnomad/sample_qc/pipeline.py Outdated Show resolved Hide resolved
wlu04 and others added 2 commits December 15, 2021 14:17
Co-authored-by: jkgoodrich <33063077+jkgoodrich@users.noreply.github.com>
Copy link
Contributor

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

three more tiny suggestions, and it looks like hl.vds.impute_sex_chromosome_ploidy might now be in Hail, so a recommit should hopefully fix that error

gnomad/sample_qc/pipeline.py Outdated Show resolved Hide resolved
gnomad/sample_qc/pipeline.py Outdated Show resolved Hide resolved
gnomad/sample_qc/pipeline.py Show resolved Hide resolved
wlu04 and others added 4 commits December 17, 2021 10:00
Co-authored-by: jkgoodrich <33063077+jkgoodrich@users.noreply.github.com>
Co-authored-by: jkgoodrich <33063077+jkgoodrich@users.noreply.github.com>
Co-authored-by: jkgoodrich <33063077+jkgoodrich@users.noreply.github.com>
Copy link
Contributor

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wlu04 wlu04 merged commit bbb4098 into master Jan 3, 2022
@wlu04 wlu04 deleted the annotate_sex branch January 3, 2022 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants