-
Notifications
You must be signed in to change notification settings - Fork 8
Quality_Assessment
Before beginning sequence_handling, make sure that your FastQ samples have been merged (if individual samples are split across multiple files) and renamed. It will be much harder to merge and/or rename files later in the pipeline.
The Quality_Assessment handler runs FastQC on a list of FastQ, SAM, or BAM samples. FastQC can process any type of FastQ encoding and can handle sample inputs that are gzipped or bzipped. Running Quality_Assessment will produce a HTML document for each sample and a summary file for all samples containing metrics on the sequence quality, sequence length distribution, sequence duplication levels, adapter content, and other quality statistics. For more information on these metrics, view the FastQC documentation.
To run Quality_Assessment, all common and handler-specific variables must be defined within the configuration file. Once the variables have been defined, Quality_Assessment can be submitted to a job scheduler with the following command (assuming that you are in the directory containing sequence_handling
):
./sequence_handling Quality_Assessment Config
Where Config
is the full file path to the configuration file.
The following are a list of variables that need to be defined within Config
. In addition to the handler-specific variables, all common variables must be defined.
Variable | Function |
---|---|
QA_QSUB |
QSub settings for batch submission. Recommended settings are "mem=1gb,nodes=1:ppn=4,walltime=6:00:00" . |
QA_SAMPLES |
The list of FastQ, SAM, or BAM samples to be processed, which can be generated using sample_list_generator.sh . This should be a plain text file with one file path per line. |
TARGET |
The size of the region that was sequenced in base pairs. For whole-genome sequencing, this is the genome size. For exome capture, this is the size of the capture region. If you do not have this information, put "NA". |
Quality_Assessment will output a HTML and a zip file for each sample in your raw sample list using FastQC. To view the HTML files, open them using your favorite web browser.
After Quality_Assessment has completed, a tab-delimited text file and a plots png file will also be generated that summarize the quality statistics for each sample. The full file path to these files will be
${OUT_DIR}/Quality_Assessment/${PROJECT}_quality_summary.txt
${OUT_DIR}/Quality_Assessment/${PROJECT}_quality_plots.png
where ${OUT_DIR}
and ${PROJECT}
are specified in the configuration file.
Quality_Assessment depends on FastQC, Riss-util, PBS, and GNU Parallel to run. All of these are available through MSI. For those not on MSI, please download and install these separately or modify the script to work with your tools. Please check the dependencies page to ensure that you are using the required version of each dependency.
Next: Adapter_Trimming
- Getting Started
- Recommended Workflow
- Configuration
- Dependencies
- sample_list_generator.sh
- Slurm specific options
- Common Problems and Errors