Skip to content

Adapter_Trimming

Skylar Wyant edited this page Jun 3, 2016 · 28 revisions

Basic Usage

The Quality_Trimming.sh script trims samples based on quality to remove low-quality regions. This script utilizes Sickle, Scythe, and Seqqs to perform the trimming. Currently, it only works on paired-end data. To run Quality_Trimming.sh, all variables must be defined within the file itself. This is accomplished by opening Quality_Trimming.sh in your favorite text editor and following instructions in the usage information section. Once the variables have been defined, Quality_Trimming.sh needs to be submitted to a job scheduler. The script is set up for PBS and submitting Quality_Trimming.sh is done with the following command:

qsub Quality_Trimming.sh

After the job has run, a list of trimmed FastQ files will be generated for use with read_mapping_start.sh, please view the output file from the job submission to obtain the path to the list.

Variables

The following are a list of variables that need to be defined within Quality_Trimming.sh, read the output file generated from the job submission to get the path to the list.

Variable Line Function
Email 5 Sets an email address for notifications of job status
SEQUENCE_HANLDING 66 The full path to the directory in which sequence_handling is stored
SAMPLE_INFO 69 A list of samples to trim
FORWARD_NAMING 75 Extension for forward files
REVERSE_NAMING 76 Extension for reverse files
PROJECT 79 A name that describes the project you are working on
SCRATCH 82 A directory that will hold results
ADAPTERS 85 A plain text or fasta file with the adapter sequences
PRIOR 89 A prior value for Scythe
THRESHOLD 94 The threshold for quality trimming in Sickle
PLATFORM 97 The platform used for sequencing. This can be found in the output files from Assess_Quality.sh
R Definition 100-102 Define the path to an R installation or load it from a cluster

Output

Quality_Trimming.sh creates trimmed FastQ files for each sample. It also generates trimming statistics to help assess quality both before and after trimming. It is still recommended that Assess_Quality.sh be used for more complete quality assurance. In addition, a list of all trimmed files will be output for use with other scripts.

Dependencies

Quality_Trimming.sh depends on Sickle, Scythe, and Seqqs to perform the trimming. These are not installed on MSI and must be installed separately. The installer.sh script has the ability to download and install these three programs from GitHub by passing the install argument. Furthermore, PBS and GNU Parallel are required for basic running. Finally, R is required for plotting trimming statistcs.

Helper Scripts: fix_quality.sh and plot_seqqs.R

Quality_Trimming.sh uses two helper scripts in the trimming process. fix_quality.sh adjusts the quality scores given by sequencing centers to a more realistic value using Awk. plot_seqqs.R creates quality comparison plots for each sample. These plots are stored in PDF documents.