Skip to content

pipeline to create a transcriptome fasta from raw RNA-seq fastq without a reference genome

License

Notifications You must be signed in to change notification settings

KoesGroup/Snakemake_trinity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snakemake_trinity

A snakemake pipeline to create a transcriptome without the use of a reference genome

Snakemake Miniconda

Aim

Creating a Denovo transcriptome.fasta from single-end RNAseq reads of a species of wich a reference genome is not available

Description

Content

  • Snakefile containing the targeted output and the rules to generate them from the input files.
  • config/ , folder containing the configuration files making the Snakefile adaptable to any input files, genome and parameter for the rules.
  • sampleSheets folder containing the tab-searated sample descriptions and files(and path to)
  • envs/, folder containing the environments needed for the Snakefile to run. Need to make one specifically for MACS2 as MACS2 uses python 2.7 following the information found here.
  • To be added: fasqs folder containing the raw single-end reads.

Usage

Conda environment

First, you need to install all softwares and packages needed with the Conda package manager.

  1. Create a virtual environment named "chipseq" from the environment.yaml file with the following command: conda env create --name RNA-Seq --file ~/envs/DCM.yaml
  2. Then, activate this virtual environment with: source activate RNA-Seq
    Now, all the basic softwares and packages versions in use are the one listed in the DCM.yaml file. The other environments (hisat2, subRead etc) will automatically be created and activated when requested by a rule.

Configuration file

The ~/configs/config.yaml file specifies the sample data file, the genomic and transcriptomic reference fasta files to use, the parameters for the rules to use, etc. This file is used so the Snakefile does not need to be changed when locations or parameters need to be changed.

Snakemake execution

The Snakemake pipeline/workflow management system reads a master file (often called Snakefile) to list the steps to be executed and defining their order. It has many rich features. Read more here.

Dry run

Use the command snakemake --use-conda -np to perform a dry run that prints out the rules and commands.

Real run

Simply type Snakemake --use-conda and provide the number of cores with --cores 10 for ten cores for instance.
For cluster execution, please refer to the Snakemake reference.

Main outputs

trinity.Trinity.fasta a fasta file containing all the predicted transcripts. trinity.Trinity.fasta.

About

pipeline to create a transcriptome fasta from raw RNA-seq fastq without a reference genome

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages