nSPECTRa

Mutation spectra analyses

nSPECTRa is a nextflow workflow that implements several methods to compute the mutation spectra for a given set of variants and multiple genome alignments which includes the ancestral genomes.

Requirements

The workflow comes with an anaconda environment which delivers most of the dependencies. However, you need to install:

anaconda
- if run with conda, install gcc > 5 (tested with version 7.3.0)
nextflow
beagle v5 or newer jar file. The workflow can download this automatically if not specified.
relate software suite Most of the remaining dependencies are downloaded by nSPECTRa at runtime.

Updates

Check the CHANGELOG for updates to this workflow.

Inputs

To run the workflow effectively, you'll need:

A vcf file with the genotypes for the samples to analyse, tbi-indexed
An HAL alignment archive inclusive of the ancestral genome and the same genome used to call the variants in the VCF
If running relate, the population lists (one text file with the IDs of the samples in each population; e.g. Pop1.txt includes the list of samples in Pop1) or a poplist prepared as described in the relate documentation
Identifier of the reference genome in the HAL archive
Identifier of the ancestral genome in the HAL archive
An effective population size estimate to use at imputation time

To generate the HAL alignment archive, we suggest to use cactus which automatically infers the ancestral sequence.

Workflow overview

The workflow constitutes of five separate components, repesented in figure below:

In short, there are six separate units:

VCF preprocessing: the vcf file gets imputed using either Beagle or SHAPEIT4, and then annotated using the VEP.
Ancestral genome generation: the workflow generate a version of the reference genome carrying the ancestral allele estimated by cactus
Exclude constrained elements: the workflow extract the constrained elements using halPhyloP from the HAL suite alignments.
K-mer mutation spectra: compute the mutation spectra for each individual at different K-mers using mutyper
Sequential dinucleotide mutation: compute the sequential dinucleotide mutation (SDM) spectra for the different individuals
Compute the mutation spectra evolution: compute the trend of the different mutation classes in the different populations using relate

The different components can be switches off depending on the analytical requirements and public resources available.

Algorithm choice

The user can choose between two different imputation algorithms and three algorithms to compute the mutation spectra.

Imputation

The user can choose to impute either with shapeit v4 or beagle v5 or newer. Both are supported, but only shapeit is included in the anaconda environment, which is the reason why it is the default choice. In case the user prefers to use beagle, this needs to be downloaded manually and the path provided with the flag --beagle. If --beagle is not provided, the workflow will download the version 5.2 automatically.
The user can choose what software to use, but keep in mind that they are mutually exclusive, and shapeit will be prioritized over beagle.

Mutation spectra

The software currently supports three software to compute the mutation spectra:

--mutyper: the default choice, it can compute the spectra for different K-mer size;
--relate: used to define the mutation spectra changes over time and an accurate estimate of the effective population size;
--sdm: method to define sequential dinucleotide polymorphisms, MNPs and adjacent SNPs from Prendergast et al., 2019.

Some steps of the workflow are very computationally intensive, and can benefit from a high degree of chunking. For large datasets (e.g. 1000GP size), we recommend lower the chunk size (e.g. to 50000), facilitating some compute intensive steps.

Input pre-filtering

We recommend to pre-filter the vcf file to obtain samples with a reasonable coverage (>8-10 mean DP) and with variants with low call rate removed from the dataset (CCR > 90%) and minor allele count MAC >= 2.

Name		Name	Last commit message	Last commit date
Latest commit History 383 Commits
assets		assets
bin		bin
conf		conf
envs		envs
imgs		imgs
include		include
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
singularity.def		singularity.def

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nSPECTRa

Mutation spectra analyses

Requirements

Updates

Inputs

Workflow overview

Algorithm choice

Imputation

Mutation spectra

Input pre-filtering

About

Releases 2

Packages

Languages

License

evotools/nSPECTRa

Folders and files

Latest commit

History

Repository files navigation

nSPECTRa

Mutation spectra analyses

Requirements

Updates

Inputs

Workflow overview

Algorithm choice

Imputation

Mutation spectra

Input pre-filtering

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages