Skip to content

Read_Mapping

Skylar Wyant edited this page Oct 31, 2017 · 21 revisions

Basic Usage

Read_Mapping starts a task array of QSub job submissions to the Portable Batch System job scheduler to read map using the Burrows-Wheeler Aligner (BWA-MEM). It can also index a FastA file using BWA if the provided reference is not already indexed.

To run Read_Mapping, all common and handler-specific variables must be defined within the configuration file. Once the variables have been defined, Read_Mapping can be submitted to a job scheduler with the following command (assuming that you are in the directory containing sequence_handling):

./sequence_handling Read_Mapping Config

Where Config is the full file path to the configuration file.

Handler-Specific Variables

The following are a list of variables that need to be defined within Config. In addition to the handler-specific variables, all common variables must be defined. The default parameters listed here are designed for cultivated barley. Parameters will need to be adjusted on a per-species basis.

Variable Function Default Value
RM_QSUB QSub settings for batch submission. Recommended settings are "mem=22gb,nodes=1:ppn=16,walltime=24:00:00". Some samples may require more than the 24 hours allowed by lab, so the use of mesabi is necessary. For more information, see the FAQ.
TRIMMED_LIST A list of adapter-trimmed or quality-trimmed samples to read map. This will be ${OUT_DIR}/Adapter_Trimming/${PROJECT}_trimmed_adapters.txt (Adapter_Trimming) or ${OUT_DIR}/Quality_Trimming/${PROJECT}_trimmed_quality.txt (Quality_Trimming).
FORWARD_TRIMMED Shared suffix for forward reads. This will be _Forward_ScytheTrimmed.fastq.gz (Adapter_Trimming) or _R1_trimmed.fastq.gz (Quality_Trimming).
REVERSE_TRIMMED Shared suffix for reverse reads. This will be _Reverse_ScytheTrimmed.fastq.gz (Adapter_Trimming) or _R2_trimmed.fastq.gz (Quality_Trimming).
SINGLES_TRIMMED Shared suffix for single reads. This will be _Single_ScytheTrimmed.fastq.gz (Adapter_Trimming) or _single_trimmed.fastq.gz (Quality_Trimming).
THREADS How many threads to use. 8
SEED Minimum seed length. 8
WIDTH Band width. 100
DROPOFF Off-diagonal x-dropoff (Z-dropoff). 100
RE_SEED Re-seed value. 1.0
CUTOFF Cutoff value. 10000
MATCH Matching score. 1
MISMATCH Mismatch penalty. 4
GAP Gap penalty. 8
EXTENSION Gap extension penalty. 1
CLIP Clipping penalty. 6
UNPAIRED Unpaired read penalty. 9
RESCUE Attempt to rescue missing hits in paired-end mode? Note: this means that reads may not be matched false
INTERLEAVED Is the first input query interleaved? false
RM_THRESHOLD Minimum threshold. 85
SECONDARY Output all alignments and mark as secondary. false
APPEND Append FastA/Q comments to SAM files. false
HARD Use hard clipping. false
SPLIT Mark split hits as secondary. true
VERBOSITY Verbosity level. Choose from 'disabled', 'errors', 'warnings', 'all', or 'debug'. 'all'

Note: If running single-end samples, leave FORWARD_TRIMMED and REVERSE_TRIMMED filled with values that do not match your samples. If running paired-end samples, leave SINGLES_TRIMMED filled with values that do not match your samples.

Output

If your reference genome is not indexed, Read_Mapping generates an index file for the reference genome in the same directory as the reference genome. After indexing Read_Mapping will exit, so you will need to run Read_Mapping again to map reads.

Read_Mapping generates aligned SAM files for each sample, located under

${OUT_DIR}/Read_Mapping/${SAMPLE}.sam

where ${OUT_DIR} is specified in the configuration file.

A list of files is not generated from Read_Mapping. However, you can generate one using sample_list_generator.sh.

Dependencies

Read_Mapping depends on the Burrows-Wheeler Aligner and the Portable Batch System to run. If you want to use a different job scheduler or read mapper, you will need to modify this script extensively. Future implementations of Read_Mapping using Bowtie 2 are under consideration. Please check the dependencies page to ensure that you are using the required version of each dependency.