Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional parameters to set_female_y_metrics_to_na_expr to use other frequency fields #635

Merged
merged 1 commit into from
Oct 24, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 24 additions & 10 deletions gnomad/utils/annotations.py
Original file line number Diff line number Diff line change
Expand Up @@ -1046,31 +1046,45 @@ def missing_callstats_expr() -> hl.expr.StructExpression:


def set_female_y_metrics_to_na_expr(
t: Union[hl.Table, hl.MatrixTable]
t: Union[hl.Table, hl.MatrixTable],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be a breaking change, but we could avoid passing in a Table or MatrixTable if freq_expr, freq_meta_expr, and freq_index_dict_expr only accept HailExpressions, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but I don't feel like going through an changing the things that we already ran this release that use that, I think it's cleaner to just leave as is for the next pypi release

freq_expr: Union[hl.expr.ArrayExpression, str] = "freq",
freq_meta_expr: Union[hl.expr.ArrayExpression, str] = "freq_meta",
freq_index_dict_expr: Union[hl.expr.DictExpression, str] = "freq_index_dict",
) -> hl.expr.ArrayExpression:
"""
Set Y-variant frequency callstats for female-specific metrics to missing structs.

.. note:: Requires freq, freq_meta, and freq_index_dict annotations to be present in Table or MatrixTable
:param t: Table or MatrixTable for which to adjust female metrics.
:param freq_expr: Array expression or string annotation name for the frequency
array. Default is "freq".
:param freq_meta_expr: Array expression or string annotation name for the frequency
metadata. Default is "freq_meta".
:param freq_index_dict_expr: Dict expression or string annotation name for the
frequency metadata index dictionary. Default is "freq_index_dict".
:return: Hail array expression to set female Y-variant metrics to missing values.
"""
if isinstance(freq_expr, str):
freq_expr = t[freq_expr]
if isinstance(freq_meta_expr, str):
freq_meta_expr = t[freq_meta_expr]
if isinstance(freq_index_dict_expr, str):
freq_index_dict_expr = t[freq_index_dict_expr]

:param t: Table or MatrixTable for which to adjust female metrics
:return: Hail array expression to set female Y-variant metrics to missing values
"""
female_idx = hl.map(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should make this

    xx_idx = hl.map(

since we've been moving away from using gender terms for our sex assignments (though I guess maybe we should rename the function also, which is also a breaking change)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a post v4 change, we can do an overhaul for that and population

lambda x: t.freq_index_dict[x],
hl.filter(lambda x: x.contains("XX"), t.freq_index_dict.keys()),
lambda x: freq_index_dict_expr[x],
hl.filter(lambda x: x.contains("XX"), freq_index_dict_expr.keys()),
)
freq_idx_range = hl.range(hl.len(t.freq_meta))
freq_idx_range = hl.range(hl.len(freq_meta_expr))

new_freq_expr = hl.if_else(
(t.locus.in_y_nonpar() | t.locus.in_y_par()),
hl.map(
lambda x: hl.if_else(
female_idx.contains(x), missing_callstats_expr(), t.freq[x]
female_idx.contains(x), missing_callstats_expr(), freq_expr[x]
),
freq_idx_range,
),
t.freq,
freq_expr,
)

return new_freq_expr
Expand Down