Skip to content
lilei edited this page Nov 2, 2020 · 4 revisions

Quickstart

This quickstart uses functionality in the dev branch of BAD_Mutations.

Make a Config File

The config file stores paths to executables and reference data.

python BAD_Mutations.py setup \
    -b /path/to/CDS/database/directory \
    -t 'target_species_name' \
    -e e_value_threshold \
    -c /path/to/config.txt

Download CDS Files

After writing a config file, use the fetch subcommand to pull the CDS files from public repositories and convert them to BLAST databases. You may omit the -u and -p options if you do not want to type your username and password as plain text into a terminal (this is probably a good thing).

python BAD_Mutations.py fetch \
    -c /path/to/config.txt \
    -u 'user@domain.com' \
    -p 'MyAwesomePassword123'

Note: the username and password are for the JGI Genome Portal

Generate Substitutions Files

Use the VeP_to_Subs.py supporting script to generate the "long" substitutions file and the per-transcript substitutions files.

mkdir -p /path/to/per-transcript/substitutions/directory
python Supporting/VeP_to_Subs.py \
    /path/to/VeP_report.txt.gz \
    /path/to/long_substitutions.txt \
    /path/to/per-transcript/substitutions/directory

Generate CDS Query Files

TBD

Generate Alignments and Trees

Generate multiple sequence alignments of putative homologues using the CDS query sequences. This runs on a transcript-by-transcript basis, so you can use a tool like GNU Parallel to run many concurrently. To parallelize, split the all CDS fasta file in your species of interest into either a) one sequence per file or b) one gene per file. Each of these “split” files can then be passed to the -f option.

python BAD_Mutations.py align \
    -c /path/to/config.txt \
    -f /path/to/transcript.fa \
    -o /path/to/MSA/output/directory

Predict Substitutions

Run the HyPhy model on the specified codons in the multiple sequence alignment, conditioning on the phylogenetic tree. This is also run on a transcript-by-transcript basis.

python BAD_Mutations.py predict \
    -c /path/to/config.txt \
    -f /path/to/transcript.fa \
    -a /path/to/MSA/output/directory/transcript_MSA.fasta \
    -r /path/to/MSA/output/directory/transcript_tree.tree \
    -s /path/to/per-transcript/substitutions/directory/transcript.subs \
    -o /path/to/predictions/output/directory

Compile Predictions

Combine the per-transcript predictions files into a single file for easy downstream analysis.

python BAD_Mutations.py compile \
    -P /path/to/predictions/output/directory \
    -S /path/to/long_substitutions.txt