Fix bug in `process_consequences` that was introduced when adding support for VEP without polyphen #710

jkgoodrich · 2024-06-14T04:08:10Z

This adds the following fixes to the current process_consequences:

There are some things wrong with this part of the code:

  modifier = _csq_score(tc)
  flag_condition = (tc.lof == "HC") & (tc.lof_flags != "")
  modifier -= hl.if_else(flag_condition, flag_score, no_flag_score)
  modifier -= hl.if_else(tc.lof == "OS", 20, 0)
  modifier -= hl.if_else(tc.lof == "LC", 10, 0)

I don't think it is supposed to have modifier = _csq_score(tc) since it's already handled by:

  tcl = tcl.map(
      lambda tc: tc.annotate(csq_score=_csq_score(tc) - _csq_score_modifier(tc))
  )

Missingness of .lof and .lof_flags is not handled correctly
- One problem is the same as the one mentioned for the original code
- Another problem is that if .lof or .lof_flags is missing, flag_condition will be missing, and therefore csq_score will be missing.
- Also, if (tc.lof == "HC") is False, then flag_condition evaluates to False, so the no_flag_score will be subtracted from the modifier. This should only happen if (tc.lof == "HC") is True.

The updated function also has the following fixes to the original code (before the support for no polyphen was added):

Allow lof_flags to be missing in addition to lof_flags == "" for lof == "HC" to have no flag penalty (note that this is a change from previous loftee annotations)
Pass csq_order to add_most_severe_consequence_to_consequence (in the default case this wouldn't have caused an issue)

Here are some tests showing the comparison of the original code, the current code, and the code in this PR

fixes_to_process_consequences_small.html.zip

mike-w-wilson

I had just a couple of comments on the readability of this where we are focusing on the module now. I also realized process_consequences produces a lot of data since we keep the entire struct for all the worst...by...s. Not for this PR but I feel we could add an option to trim this down so say worst_csq_by_gene only returns could be a dict of gene symbol and csq term. I'll make a ticket for the backlog is this is something you agree could be useful?

mike-w-wilson · 2024-06-14T12:17:28Z

gnomad/utils/vep.py

+                .when((tc.lof == "HC") & hl.or_else(tc.lof_flags == "", True), no_flag)
+                .when((tc.lof == "HC") & (tc.lof_flags != ""), flag)


I know this is unchanged from before but since no_flag and flag are only used once and you cant pass a value for flag, I think its more difficult to read with the variables, because you need to go back up to the assignment. I'd just do 500, 500 /(1 + penalize_flags). IF you agree with the value adjustment above.

just going to leave as is for now since my next PR will completely remove the use of the scores

gnomad/utils/vep.py

mike-w-wilson · 2024-06-14T12:36:51Z

gnomad/utils/vep.py

+        flag = 500
+        no_flag = flag * (1 + penalize_flags)


Also I know this is from before but since were in the process of revamping this module, I think the language around and then how we handle the logic in relation to those flags is counterintuitive, i.e the penalize_flag logic is actually a no_flag booster. These values seem arbitrary and I cant imagine anyone is actually using these score values themselves since LOF is deducted so far down. It would be more clear if the scores were no_flag=500, flag = 500/(1+penalize_flags).

Yeah, I'm actually completely removing the scores in my next PR

Co-authored-by: Mike Wilson <mwilson@broadinstitute.org>

jkgoodrich · 2024-06-14T14:02:36Z

Yeah, my next PR completely changes this function, but it will take more time to review, so just getting this in with minimal changes to fix the bug. These changes were my first round of changes to fix the function before I got completely confused by this and other functions in vep.py and completely rewrote it

mike-w-wilson

mike-w-wilson · 2024-06-14T14:35:06Z

Thank you!

jkgoodrich added 2 commits June 13, 2024 21:59

Fix bug in process consequences and clean up

eaa2e3a

Use updated add_most_severe_consequence_to_consequence(

223fa65

jkgoodrich added bug Changelog: bug fix labels Jun 14, 2024

jkgoodrich requested a review from mike-w-wilson June 14, 2024 04:08

jkgoodrich assigned klaricch, mike-w-wilson and jkgoodrich Jun 14, 2024

mike-w-wilson reviewed Jun 14, 2024

View reviewed changes

Update gnomad/utils/vep.py

832e7c5

Co-authored-by: Mike Wilson <mwilson@broadinstitute.org>

Format

a391463

jkgoodrich requested a review from mike-w-wilson June 14, 2024 14:05

mike-w-wilson approved these changes Jun 14, 2024

View reviewed changes

jkgoodrich merged commit 97ec47e into main Jun 14, 2024
5 checks passed

jkgoodrich deleted the jg/fix_process_consequences_small branch June 14, 2024 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug in `process_consequences` that was introduced when adding support for VEP without polyphen #710

Fix bug in `process_consequences` that was introduced when adding support for VEP without polyphen #710

jkgoodrich commented Jun 14, 2024 •

edited

Loading

mike-w-wilson left a comment

mike-w-wilson Jun 14, 2024

jkgoodrich Jun 14, 2024

mike-w-wilson Jun 14, 2024

jkgoodrich Jun 14, 2024

jkgoodrich commented Jun 14, 2024

mike-w-wilson left a comment

mike-w-wilson commented Jun 14, 2024

		.when((tc.lof == "HC") & hl.or_else(tc.lof_flags == "", True), no_flag)
		.when((tc.lof == "HC") & (tc.lof_flags != ""), flag)

Fix bug in process_consequences that was introduced when adding support for VEP without polyphen #710

Fix bug in process_consequences that was introduced when adding support for VEP without polyphen #710

Conversation

jkgoodrich commented Jun 14, 2024 • edited Loading

mike-w-wilson left a comment

Choose a reason for hiding this comment

mike-w-wilson Jun 14, 2024

Choose a reason for hiding this comment

jkgoodrich Jun 14, 2024

Choose a reason for hiding this comment

mike-w-wilson Jun 14, 2024

Choose a reason for hiding this comment

jkgoodrich Jun 14, 2024

Choose a reason for hiding this comment

jkgoodrich commented Jun 14, 2024

mike-w-wilson left a comment

Choose a reason for hiding this comment

mike-w-wilson commented Jun 14, 2024

Fix bug in `process_consequences` that was introduced when adding support for VEP without polyphen #710

Fix bug in `process_consequences` that was introduced when adding support for VEP without polyphen #710

jkgoodrich commented Jun 14, 2024 •

edited

Loading