A snakemake pipeline to create a transcriptome without the use of a reference genome
Creating a Denovo transcriptome.fasta from single-end RNAseq reads of a species of wich a reference genome is not available
- Snakefile containing the targeted output and the rules to generate them from the input files.
- config/ , folder containing the configuration files making the Snakefile adaptable to any input files, genome and parameter for the rules.
- sampleSheets folder containing the tab-searated sample descriptions and files(and path to)
- envs/, folder containing the environments needed for the Snakefile to run. Need to make one specifically for MACS2 as MACS2 uses python 2.7 following the information found here.
- To be added: fasqs folder containing the raw single-end reads.
First, you need to install all softwares and packages needed with the Conda package manager.
- Create a virtual environment named "chipseq" from the
environment.yaml
file with the following command:conda env create --name RNA-Seq --file ~/envs/DCM.yaml
- Then, activate this virtual environment with:
source activate RNA-Seq
Now, all the basic softwares and packages versions in use are the one listed in theDCM.yaml
file. The other environments (hisat2, subRead etc) will automatically be created and activated when requested by a rule.
The ~/configs/config.yaml
file specifies the sample data file, the genomic and transcriptomic reference fasta files to use, the parameters for the rules to use, etc. This file is used so the Snakefile
does not need to be changed when locations or parameters need to be changed.
The Snakemake pipeline/workflow management system reads a master file (often called Snakefile
) to list the steps to be executed and defining their order.
It has many rich features. Read more here.
Use the command snakemake --use-conda -np
to perform a dry run that prints out the rules and commands.
Simply type Snakemake --use-conda
and provide the number of cores with --cores 10
for ten cores for instance.
For cluster execution, please refer to the Snakemake reference.
trinity.Trinity.fasta a fasta file containing all the predicted transcripts. trinity.Trinity.fasta.