-
Notifications
You must be signed in to change notification settings - Fork 5
Quickstart
This quickstart uses functionality in the dev
branch of BAD_Mutations.
The config file stores paths to executables and reference data.
python BAD_Mutations.py setup \
-b /path/to/CDS/database/directory \
-t 'target_species_name' \
-e e_value_threshold \
-c /path/to/config.txt
After writing a config file, use the fetch
subcommand to pull the CDS files from public repositories and convert them to BLAST databases. You may omit the -u
and -p
options if you do not want to type your username and password as plain text into a terminal (this is probably a good thing).
python BAD_Mutations.py fetch \
-c /path/to/config.txt \
-u 'user@domain.com' \
-p 'MyAwesomePassword123'
Note: the username and password are for the JGI Genome Portal
Use the VeP_to_Subs.py
supporting script to generate the "long" substitutions file and the per-transcript substitutions files.
mkdir -p /path/to/per-transcript/substitutions/directory
python Supporting/VeP_to_Subs.py \
/path/to/VeP_report.txt.gz \
/path/to/long_substitutions.txt \
/path/to/per-transcript/substitutions/directory
TBD
Generate multiple sequence alignments of putative homologues using the CDS query sequences. This runs on a transcript-by-transcript basis, so you can use a tool like GNU Parallel to run many concurrently. To parallelize, split the all CDS fasta file in your species of interest into either a) one sequence per file or b) one gene per file. Each of these “split” files can then be passed to the -f option.
python BAD_Mutations.py align \
-c /path/to/config.txt \
-f /path/to/transcript.fa \
-o /path/to/MSA/output/directory
Run the HyPhy
model on the specified codons in the multiple sequence alignment, conditioning on the phylogenetic tree. This is also run on a transcript-by-transcript basis.
python BAD_Mutations.py predict \
-c /path/to/config.txt \
-f /path/to/transcript.fa \
-a /path/to/MSA/output/directory/transcript_MSA.fasta \
-r /path/to/MSA/output/directory/transcript_tree.tree \
-s /path/to/per-transcript/substitutions/directory/transcript.subs \
-o /path/to/predictions/output/directory
Combine the per-transcript predictions files into a single file for easy downstream analysis.
python BAD_Mutations.py compile \
-P /path/to/predictions/output/directory \
-S /path/to/long_substitutions.txt