Manual_v2.0

Overview

BAD_Mutations (BLAST-Aligned-Deleterious?) performs a likelihood ratio test (LRT) for the prediction of deleterious variants. The package is comprised of Python and Bourne Again Shell (BASH) scripts. The LRT uses a HyPhy script. BAD_Mutations was originally written with Python 2 syntax, but is being fully converted for Python 3. BAD_Mutations is designed to be run from the command line. Running from an interactive Python environment is not recommended nor supported.

BAD_Mutations contains five major subcommands: setup, fetch, align, predict, and compile. Both setup and fetch are meant to be run once, or very rarely. The align subcommand generates phylogenetic trees and multiple sequence alignments for input to the prediction scripts. The predict subcommand does the actual variant effect prediction. More information about how to run BAD_Mutations is available in the “Usage” section.

Briefly, BAD_Mutations predicts deleterious variants using a sequence constraint approach. For a given query gene sequence and list of nonsynonymous SNPs, a multiple sequence alignment among homologues is produced, and the given codons are tested for conservation. Variants that alter a codon with a high degree of conservation are inferred to be deleterious. More details on the procedure in BAD_Mutations is available in the “Methods” section.

An alignment of phytochrome C (PhyC) used for prediction of a nonsynonymous SNP in barley is shown below.

Alignment

The phylogenetic relationships of the gene sequences are shown on the left. The consensus sequence is shown at the top. Dots correspond to identity to consensus, and letters show mismatches. Colored boxes behind codon triplets correspond to amino acid residues. The query polymorphism is shown in yellow. The codon is conserved, so the SNP is inferred to be deleterious. This polymorphism is causative for early maturity in barley (Nishida et al. (2013)).

Examples of alignment columns that produce deleterious and 'tolerated' predictions are shown below.

Deleterious and tolerated

The one-letter amino acid code for the derived state of the variant is shown on the left, and the ancestral state is shown in the center. The alignment column is shown as a string of amino acid codes on the right, with dashes representing gaps. Deleterious SNPs alter columns that have much higher amino acid conservation than tolerated SNPs.

Header	Value Type	Description
Position	Integer	Nucleotide position in the MSA
L0	Float	Likelihood of null hypothesis - codon evolving neutrally
L1	Float	Likelihood of alt hypothesis - codon evolving under selective constraint
Constraint	Float	A constraint value for the codon across the phylogeny
ChiSquared	Float
P-value	Float	A p-value for the likelihood ratio test
SeqCount	Integer	Number of non-gap amino acid residues in the alignment at that position
Alignment	String	Alignment column, showing amino acids and gaps
ReferenceAA	String	Amino acid state in reference species
MaskedConstraint	Float	A constraint value for the codon across the phylogeny, without the reference species
MaskedP-value	Float	A p-value for the likelihood ratio test, without the reference species

Option	Value	Description
`-h`	NA	Show help message and exit.
	’DEBUG’	Be very verbose. Print all messages.
	’INFO’	Just print info, warning, and error messages. Useful for progress checking.
`-v/--verbose`	’WARNING’	Print warnings and errors. Default setting.
	’ERROR’	Only print error messages.
	’CRITICAL’	Print almost nothing. Critical failures only.

Option	Value	Description
`--list-species`	NA	Show all species databases available.
`-c/--config`	[FILE]	Name of the configuration file. Defaults to `LRTPredict_Config.txt`.
`-b/--base`	[DIR]	Directory to store the BLAST databases. Defaults to the current directory.
`-d/--deps-dir`	[DIR]	Directory to download and store the dependencies. Defaults to current directory.
`-t/--target`	[SP_NAME]	Target species name. Must be one of the species (case sensitive) given by `--list-species`. This species will be excluded from the prediction pipeline to avoid reference bias. No default.
`-e/--evalue`	[FLOAT]	E-value threshold for accepting TBLASTX hits as putative homologues. Defaults to 0.05.

Option	Value	Description
`-c/--config`	[FILE]	Path to configuration file. Defaults to `LRTPredict_Config.txt`.
`-b/--base`*	[DIR]	Directory to store the BLAST databases. Defaults to the current directory.
`-u/--user`	[STR]	Username for JGI Genome Portal. Required.
`-p/--password`	[STR]	Password for JGI Genome Portal. If not supplied on command line, will prompt user for the password.
`--fetch-only`	NA	If supplied, do not convert CDS FASTA files into BLAST databases.
`--convert-only`	NA	If supplied, only unzip and convert FASTA files into BLAST databases. Do not download.

Option	Value	Description
`-a/--alignment`	[FILE]	Path to the multiple sequence alignment file. Required.
`-c/--config`	[FILE]	Path to configuration file. Defaults to `LRTPredict_Config.txt`.
`-r/--tree`	[FILE]	Path to the phylogenetic tree. Required.
`-s/--substitutions`	[FILE]	Path to substitutions file. Required
`-o/--output`	[DIR]	Directory for output. Defaults to current directory.

Manual_v2.0

Table of Contents

Overview

Citation

Downloading

dev Functionality

Dependencies

Installation of Conda Environment

Instructions for UMN MSI

Input Files

A Note on Transcript Names

Output

Raw HyPhy Format

Compiled HyPhy Report

Making Deleterious Predictions

Inferring Ancestral States

Usage

Basic Invocation

Subcommands, Options, and Switches

General Options

The setup Subcommand

The fetch Subcommand

The align Subcommand

The predict Subcommand

The compile Subcommand

Example Command Lines

A Note on Parallel Execution

Configuration File Format

Runtimes and Benchmarks

Methods

Navigation

Clone this wiki locally

`dev` Functionality

The `setup` Subcommand

The `fetch` Subcommand

The `align` Subcommand

The `predict` Subcommand

The `compile` Subcommand