-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vep versions #282
Add vep versions #282
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we can implement or how often it will come up but if an individual spins up a hail cluster with VEP, the default config file will be copied to "file:///vep_data/vep-gcloud.json". If hail updates to VEP 101, someone running this script can request the VEP 95 context HT and if any variants get filtered to the revep_ht, they would then have a HT with VEP'd variants from two different versions of VEP. Is this something we want to try to prevent? Or maybe a just print a warning reminding people to make sure, the VEP context table is the same as the version of VEP they are using?
Yeah, I thought about that too, but I am not sure how to grab the VEP version that was used in an init script for the running cluster. A warning would be easiest, but much less satisfying. @nawatts do you know an easy way to grab the version of VEP that will be used with hl.vep? |
A similar issue occurs if someone uses a different VEP configuration that the one used to generate the context Table (see #165). Maybe we could store the VEP version and configuration in the globals of the context Table and assert that they match the version / configuration used when It looks like VEP doesn't have a CLI option to print version info. The version is included in the output of VEP 85 has:
while VEP 95 has:
The easiest thing may be to store the entire output of import json
import os
import subprocess
def get_vep_help():
with hl.hadoop_open(os.environ["VEP_CONFIG_URI"]) as vep_config_file:
vep_config = json.load(vep_config_file)
vep_command = vep_config["command"]
vep_help = subprocess.check_output([vep_command[0]]).decode("utf-8")
return vep_help |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly minor comments. The only issue I see is getting VEP help based on the vep_config_path
argument vs the VEP_CONFIG_URI
environment variable.
vep_context_help = hl.eval(reference_vep_ht.vep_help) | ||
vep_context_config = hl.eval(reference_vep_ht.vep_config) | ||
|
||
assert vep_help == vep_context_help, ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these also be checked if a reference_vep_ht
is passed? Or do we assume anyone doing so is aware of the requirements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't going to because then is requires them to add those globals, but happy to force users to add them if you think it would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could go either way. It would be nice to have the guard rails, but it's not obvious how to make a Table compatible with these functions. No objection to leaving them out, though it might be worth a note in the docstring to make sure the reference_vep_ht
was generated with the same version of VEP / VEP configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Moved the VEPed context paths out of vep utils and into resources, also turning it into a versioned resource. If we add a VEPed context HT for v101 when fixed then we will need to have versioning. Also changed
vep_or_lookup_vep
to handle the versioning and if a version is not present it will warn the user that it is going to VEP everything from scratch.