Skip to content

Adapter_Trimming

Skylar Wyant edited this page Aug 1, 2016 · 28 revisions

Basic Usage

The Adapter_Trimming handler trims junk adapter sequences off of actual sample sequences. This script utilizes Scythe, which can be run on paired-end or single-end data. Scythe uses a Bayesian model with a prior contaminant estimate instead of a fixed number of mismatches. For more information, please read the Scythe documentation. To run Adapter_Trimming, all common and wrapper-specific variables must be defined within the configuration file. Once the variables have been defined, Adapter_Trimming can be submitted to a job scheduler with the following command:

./sequence_handling Adapter_Trimming Config

Where Config is the full file path to the configuration file.

Handler-Specific Variables

The following are a list of variables that need to be defined within Config. In addition to the handler-specific variables, all common variables must be defined.

Variable Function
AT_QSUB QSub settings for batch submission. Recommended settings are "mem=1gb,nodes=1:ppn=8,walltime=10:00:00".
FORWARD_NAMING Suffix for forward reads. If your files are named sample1_R1.fastq and sample2_R1.fastq, then FORWARD_NAMING=_R1.fastq
REVERSE_NAMING Suffix for reverse reads. If your files are named sample1_R2.fastq and sample2_R2.fastq, then FORWARD_NAMING=_R2.fastq
ADAPTERS A plain text or FastA file with the adapter sequences.
PRIOR A prior contaminate estimate for Scythe.

Note: If you have single-end samples, leave FORWARD_NAMING and REVERSE_NAMING filled with values that do NOT match your samples.

Output

Adapter_Trimming creates a trimmed gzipped FastQ file for each sample. In addition, a list of all trimmed files will be output for use with other handlers. The full file path to this list will be

${OUT_DIR}/Adapter_Trimming/${PROJECT}_trimmed_adapters.txt

where ${OUT_DIR} and ${PROJECT} are specified in the configuration file.

After running Adapter_Trimming, there are three options for further processing.

  1. Quality_Assessment can be used for more complete quality assurance.
  2. Quality_Trimming can be used to trim low quality bases.
  3. Read_Mapping can be used to map reads to a reference genome. Read mapping with BWA-MEM generally does not require low quality bases to be removed.

Dependencies

Adapter_Trimming depends on Scythe to perform the trimming. This is not installed on MSI and must be installed separately. Furthermore, PBS and GNU Parallel are required for operation.