Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vep versions #282

Merged
merged 10 commits into from
Jan 20, 2021
Merged

Add vep versions #282

merged 10 commits into from
Jan 20, 2021

Conversation

jkgoodrich
Copy link
Contributor

Moved the VEPed context paths out of vep utils and into resources, also turning it into a versioned resource. If we add a VEPed context HT for v101 when fixed then we will need to have versioning. Also changed vep_or_lookup_vep to handle the versioning and if a version is not present it will warn the user that it is going to VEP everything from scratch.

Copy link
Contributor

@mike-w-wilson mike-w-wilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we can implement or how often it will come up but if an individual spins up a hail cluster with VEP, the default config file will be copied to "file:///vep_data/vep-gcloud.json". If hail updates to VEP 101, someone running this script can request the VEP 95 context HT and if any variants get filtered to the revep_ht, they would then have a HT with VEP'd variants from two different versions of VEP. Is this something we want to try to prevent? Or maybe a just print a warning reminding people to make sure, the VEP context table is the same as the version of VEP they are using?

gnomad/utils/vep.py Outdated Show resolved Hide resolved
gnomad/utils/vep.py Outdated Show resolved Hide resolved
@jkgoodrich
Copy link
Contributor Author

I'm not sure if we can implement or how often it will come up but if an individual spins up a hail cluster with VEP, the default config file will be copied to "file:///vep_data/vep-gcloud.json". If hail updates to VEP 101, someone running this script can request the VEP 95 context HT and if any variants get filtered to the revep_ht, they would then have a HT with VEP'd variants from two different versions of VEP. Is this something we want to try to prevent? Or maybe a just print a warning reminding people to make sure, the VEP context table is the same as the version of VEP they are using?

Yeah, I thought about that too, but I am not sure how to grab the VEP version that was used in an init script for the running cluster. A warning would be easiest, but much less satisfying. @nawatts do you know an easy way to grab the version of VEP that will be used with hl.vep?

@nawatts
Copy link
Contributor

nawatts commented Jan 15, 2021

A similar issue occurs if someone uses a different VEP configuration that the one used to generate the context Table (see #165).

Maybe we could store the VEP version and configuration in the globals of the context Table and assert that they match the version / configuration used when vep_or_lookup_vep is run? It's still possible that the same VEP command could refer to different data files, but there's only so much we can do. We could also add a warning in the docstring.

It looks like VEP doesn't have a CLI option to print version info. The version is included in the output of vep --help. However, the format has changed between VEP 85 and 95.

VEP 85 has:

version 85

while VEP 95 has:

Versions:
  ensembl              : 95.4f83453
  ensembl-funcgen      : 95.94439f4
  ensembl-io           : 95.bd1a78d
  ensembl-variation    : 95.858de3e
  ensembl-vep          : 95.1

The easiest thing may be to store the entire output of vep --help vs trying to parse version(s) out of it. For getting the VEP help, this should work for Dataproc clusters created with hailctl dataproc start --vep. It's not very robust though, it assumes that the command is like /path/to/vep ... vs perl /path/to/vep ....

import json
import os
import subprocess

def get_vep_help():
    with hl.hadoop_open(os.environ["VEP_CONFIG_URI"]) as vep_config_file:
        vep_config = json.load(vep_config_file)
        vep_command = vep_config["command"]
        vep_help = subprocess.check_output([vep_command[0]]).decode("utf-8")
        return vep_help

Copy link
Contributor

@nawatts nawatts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor comments. The only issue I see is getting VEP help based on the vep_config_path argument vs the VEP_CONFIG_URI environment variable.

gnomad/utils/vep.py Outdated Show resolved Hide resolved
gnomad/utils/vep.py Show resolved Hide resolved
gnomad/utils/vep.py Outdated Show resolved Hide resolved
gnomad/utils/vep.py Show resolved Hide resolved
vep_context_help = hl.eval(reference_vep_ht.vep_help)
vep_context_config = hl.eval(reference_vep_ht.vep_config)

assert vep_help == vep_context_help, (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these also be checked if a reference_vep_ht is passed? Or do we assume anyone doing so is aware of the requirements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't going to because then is requires them to add those globals, but happy to force users to add them if you think it would be helpful

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could go either way. It would be nice to have the guard rails, but it's not obvious how to make a Table compatible with these functions. No objection to leaving them out, though it might be worth a note in the docstring to make sure the reference_vep_ht was generated with the same version of VEP / VEP configuration.

@jkgoodrich jkgoodrich requested a review from nawatts January 20, 2021 21:32
gnomad/utils/vep.py Outdated Show resolved Hide resolved
@jkgoodrich jkgoodrich requested a review from nawatts January 20, 2021 21:41
Copy link
Contributor

@mike-w-wilson mike-w-wilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jkgoodrich jkgoodrich merged commit e1059c9 into master Jan 20, 2021
@jkgoodrich jkgoodrich deleted the add_vep_versions branch January 20, 2021 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants