Skip to content

Adapter_Trimming

Skylar Wyant edited this page Oct 9, 2018 · 28 revisions

Basic Usage

The Adapter_Trimming handler trims junk adapter sequences off of sample reads. This script utilizes Scythe, which can be run on paired-end or single-end data. Adapter_Trimming takes FastQ or gzipped FastQ files as input and returns gzipped FastQ files. Scythe uses a Bayesian model with a prior contaminant estimate instead of a fixed number of mismatches. For more information, please read the Scythe documentation.

To run Adapter_Trimming, all common and handler-specific variables must be defined within the configuration file. Once the variables have been defined, Adapter_Trimming can be submitted to a job scheduler with the following command (assuming that you are in the directory containing sequence_handling):

./sequence_handling Adapter_Trimming Config

Where Config is the full file path to the configuration file.

Handler-Specific Variables

The following are a list of variables that need to be defined within Config. In addition to the handler-specific variables, all common variables must be defined.

Variable Function
AT_QSUB QSub settings for batch submission. Recommended settings are "mem=1gb,nodes=1:ppn=4,walltime=10:00:00".
RAW_SAMPLES The list of raw samples to be processed, which can be generated using sample_list_generator.sh. This should be a plain text file with one file path per line.
FORWARD_NAMING Shared suffix for forward reads. Example: If your files are named sample1_R1.fastq and sample2_R1.fastq, then FORWARD_NAMING=_R1.fastq
REVERSE_NAMING Shared suffix for reverse reads. Example: If your files are named sample1_R2.fastq and sample2_R2.fastq, then REVERSE_NAMING=_R2.fastq
ADAPTERS A plain text or FastA file with the adapter sequences. These sequences will depend on the technology and platform used for sequencing, but most common adapters for various platforms can be found online.
PRIOR A prior contaminate estimate for Scythe. Scythe's documentation suggests starting at 0.05 and then experimenting as needed.

Note: If you have single-end samples, leave FORWARD_NAMING and REVERSE_NAMING filled with values that donot match your samples. If none of your samples match the forward or reverse naming suffixes, Adapter_Trimming will automatically assume that the samples are single-end.

Output

Adapter_Trimming creates a trimmed, gzipped FastQ file for each sample. If you have paired end data, then each sample should have a samplename_Forward_ScytheTrimmed.fastq.gz and a samplename_Reverse_ScytheTrimmed.fastq.gz. If you're only getting samplename_Single_ScytheTrimmed.fastq.gz, then you have likely entered FORWARD_NAMING and/or REVERSE_NAMING incorrectly.

In addition, a list of all trimmed files will be output for use with other handlers. The full file path to this list will be

${OUT_DIR}/Adapter_Trimming/${PROJECT}_trimmed_adapters.txt

where ${OUT_DIR} and ${PROJECT} are specified in the configuration file.

After running Adapter_Trimming, there are three options for further processing.

  1. Quality_Assessment can be used to re-check adapter contamination for each sample to ensure quality. This is the recommended workflow.
  2. Quality_Trimming can be used to trim low quality bases. This is not necessary when Read_Mapping with BWA-MEM and so it is recommended that you skip this step and go to Read_Mapping.
  3. Read_Mapping can be used to map paired-end or GBS reads to a reference genome with BWA-MEM.

Dependencies

Adapter_Trimming depends on Scythe to perform the trimming. Furthermore, PBS is required for operation. Please check the dependencies page to ensure that you are using the required version of each dependency.