-
Notifications
You must be signed in to change notification settings - Fork 8
Adapter_Trimming
The Adapter_Trimming handler trims junk adapter sequences off of sample reads. This script utilizes Scythe, which can be run on paired-end or single-end data. Adapter_Trimming takes FastQ, FastA, or gzipped FastQ/FastA files as input and returns gzipped FastQ or FastA files. Scythe uses a Bayesian model with a prior contaminant estimate instead of a fixed number of mismatches. For more information, please read the Scythe documentation. To run Adapter_Trimming, all common and wrapper-specific variables must be defined within the configuration file. Once the variables have been defined, Adapter_Trimming can be submitted to a job scheduler with the following command (assuming that you are in the directory containing sequence_handling
):
./sequence_handling Adapter_Trimming Config
Where Config
is the full file path to the configuration file.
The following are a list of variables that need to be defined within Config
. In addition to the handler-specific variables, all common variables must be defined.
Variable | Function |
---|---|
AT_QSUB |
QSub settings for batch submission. Recommended settings are "mem=1gb,nodes=1:ppn=8,walltime=10:00:00" . |
FORWARD_NAMING |
Shared suffix for forward reads. If your files are named sample1_R1.fastq and sample2_R1.fastq , then FORWARD_NAMING=_R1.fastq
|
REVERSE_NAMING |
Shared suffix for reverse reads. If your files are named sample1_R2.fastq and sample2_R2.fastq , then FORWARD_NAMING=_R2.fastq
|
ADAPTERS |
A plain text or FastA file with the adapter sequences. |
PRIOR |
A prior contaminate estimate for Scythe. Scythe's documentation suggests starting at 0.05 and then experimenting as needed. |
Note: If you have single-end samples, leave FORWARD_NAMING
and REVERSE_NAMING
filled with values that do NOT match your samples. If none of your samples match the forward or reverse naming suffixes, Adapter_Trimming will automatically assume that the samples are single-end.
Adapter_Trimming creates a trimmed gzipped FastQ file for each sample. In addition, a list of all trimmed files will be output for use with other handlers. The full file path to this list will be
${OUT_DIR}/Adapter_Trimming/${PROJECT}_trimmed_adapters.txt
where ${OUT_DIR}
and ${PROJECT}
are specified in the configuration file.
After running Adapter_Trimming, there are three options for further processing.
- Quality_Assessment can be used for more complete quality assurance.
- Quality_Trimming can be used to trim low quality bases. This is generally not necessary when Read_Mapping with BWA-MEM. Future implementations of Read_Mapping using Bowtie 2 may require Quality_Trimming.
- Read_Mapping can be used to map paired-end or GBS reads to a reference genome.
Adapter_Trimming depends on Scythe to perform the trimming. This is not installed on MSI and must be installed separately. Furthermore, PBS and GNU Parallel are required for operation.
Next: Quality_Trimming or Read_Mapping
- Getting Started
- Recommended Workflow
- Configuration
- Dependencies
- sample_list_generator.sh
- Slurm specific options
- Common Problems and Errors