Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in process_consequences that was introduced when adding support for VEP without polyphen #710

Merged
merged 4 commits into from
Jun 14, 2024

Conversation

jkgoodrich
Copy link
Contributor

@jkgoodrich jkgoodrich commented Jun 14, 2024

This adds the following fixes to the current process_consequences:

  • There are some things wrong with this part of the code:
      modifier = _csq_score(tc)
      flag_condition = (tc.lof == "HC") & (tc.lof_flags != "")
      modifier -= hl.if_else(flag_condition, flag_score, no_flag_score)
      modifier -= hl.if_else(tc.lof == "OS", 20, 0)
      modifier -= hl.if_else(tc.lof == "LC", 10, 0)
    
  • I don't think it is supposed to have modifier = _csq_score(tc) since it's already handled by:
      tcl = tcl.map(
          lambda tc: tc.annotate(csq_score=_csq_score(tc) - _csq_score_modifier(tc))
      )
    
  • Missingness of .lof and .lof_flags is not handled correctly
    • One problem is the same as the one mentioned for the original code
    • Another problem is that if .lof or .lof_flags is missing, flag_condition will be missing, and therefore csq_score will be missing.
    • Also, if (tc.lof == "HC") is False, then flag_condition evaluates to False, so the no_flag_score will be subtracted from the modifier. This should only happen if (tc.lof == "HC") is True.

The updated function also has the following fixes to the original code (before the support for no polyphen was added):

  • Allow lof_flags to be missing in addition to lof_flags == "" for lof == "HC" to have no flag penalty (note that this is a change from previous loftee annotations)
  • Pass csq_order to add_most_severe_consequence_to_consequence (in the default case this wouldn't have caused an issue)

Here are some tests showing the comparison of the original code, the current code, and the code in this PR

fixes_to_process_consequences_small.html.zip

Copy link
Contributor

@mike-w-wilson mike-w-wilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had just a couple of comments on the readability of this where we are focusing on the module now. I also realized process_consequences produces a lot of data since we keep the entire struct for all the worst...by...s. Not for this PR but I feel we could add an option to trim this down so say worst_csq_by_gene only returns could be a dict of gene symbol and csq term. I'll make a ticket for the backlog is this is something you agree could be useful?

Comment on lines +394 to +395
.when((tc.lof == "HC") & hl.or_else(tc.lof_flags == "", True), no_flag)
.when((tc.lof == "HC") & (tc.lof_flags != ""), flag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is unchanged from before but since no_flag and flag are only used once and you cant pass a value for flag, I think its more difficult to read with the variables, because you need to go back up to the assignment. I'd just do 500, 500 /(1 + penalize_flags). IF you agree with the value adjustment above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just going to leave as is for now since my next PR will completely remove the use of the scores

gnomad/utils/vep.py Outdated Show resolved Hide resolved
Comment on lines +382 to +383
flag = 500
no_flag = flag * (1 + penalize_flags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I know this is from before but since were in the process of revamping this module, I think the language around and then how we handle the logic in relation to those flags is counterintuitive, i.e the penalize_flag logic is actually a no_flag booster. These values seem arbitrary and I cant imagine anyone is actually using these score values themselves since LOF is deducted so far down. It would be more clear if the scores were no_flag=500, flag = 500/(1+penalize_flags).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm actually completely removing the scores in my next PR

Co-authored-by: Mike Wilson <mwilson@broadinstitute.org>
@jkgoodrich
Copy link
Contributor Author

Yeah, my next PR completely changes this function, but it will take more time to review, so just getting this in with minimal changes to fix the bug. These changes were my first round of changes to fix the function before I got completely confused by this and other functions in vep.py and completely rewrote it

@jkgoodrich jkgoodrich requested a review from mike-w-wilson June 14, 2024 14:05
Copy link
Contributor

@mike-w-wilson mike-w-wilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@mike-w-wilson
Copy link
Contributor

Thank you!

@jkgoodrich jkgoodrich merged commit 97ec47e into main Jun 14, 2024
5 checks passed
@jkgoodrich jkgoodrich deleted the jg/fix_process_consequences_small branch June 14, 2024 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants