Recommended_Workflow

Workflow

1. Quality_Assessment

To start, run Quality_Assessment on your raw FastQ files. The Quality_Assessment handler runs FastQC on a series of samples and outputs metrics used for quality control. It accepts FASTQ, SAM, and BAM files as input and outputs a summary table and individual HTML files for visualization. The Quality_Assessment handler depends on FastQC and GNU Parallel.

2. Adapter_Trimming

The Adapter_Trimming handler uses Scythe to trim specific adapter sequences from FastQ files. This handler differentiates between forward, reverse, and single-end FastQ files automatically. The Adapter_Trimming handler depends on Scythe and GNU Parallel.

Quality_Assessment

After Adapter_Trimming, it is recommended to run Quality_Assessment again on the trimmed FastQ files to ensure that all adapter contamination was properly removed.

3. Read_Mapping

The Read_Mapping handler maps sequence reads to a reference genome using BWA-MEM. This handler uses Torque Task Arrays, part of the Portable Batch System. The Read_Mapping handler depends on the Burrows-Wheeler Aligner.

4. SAM_Processing with Picard

The SAM_Processing handler converts the SAM files from read mapping with BWA to the BAM format using SAMTools. In the conversion process, it will sort and deduplicate the data for the finished BAM file, also using SAMTools. Alignment statistics will also be generated for both raw and finished BAM files. The SAM_Processing handler depends on SAMTools and GNU Parallel.

5. Coverage_Mapping

The Coverage_Mapping handler generates coverage histograms and summary statistics from BAM files using BEDTools. Plots of coverage are generated using R based on coverage maps. The Coverage_Mapping handler depends on BEDTools, R, and GNU Parallel.

6. Haplotype_Caller

To begin the variant discovery process from your finished BAM files, the Haplotype_Caller handler uses GATK to generate genomic VCF files for each sample.

7. Genotype_GVCFs

The Genotype_GVCFs hander converts the GVCF files for the entire dataset into VCF files broken up by chromosome or chromosome part using GATK. Breaking the output into chromosome parts allows the process to be split into a task array and greatly speeds up processing time.

8. Create_HC_Subset

The Create_HC_Subset handler creates a single VCF file that contains only the high-confidence sites for your samples. This filtering is performed in multiple steps using several different user-defined parameters and before-and-after percentile tables are generated. Create_HC_Subset depends on VCFtools and vcflib for manipulating the VCF file.

9. Variant_Recalibrator

The Variant_Recalibrator handler uses the GATK and user-provided prior sets of "truth" variants to create a model that attempts to separate true variants from false positives. An unfiltered VCF file the the FILTER field annotated is generated.

10. Variant_Filtering

The Variant_Filtering handler creates a single variant call format (VCF) file that contains only high-quality sites and genotypes for your samples. This filtering is performed in multiple steps using several different user-defined parameters and before-and-after percentile tables are generated. Variant_Filtering depends on VCFtools and vcflib for manipulating the VCF file.

11. Variant_Analysis

The Variant_Analysis handler uses a variety of dependencies to produce statistics about the input VCF file. Information generated by the handler includes heterozygosity summaries, missing-ness summaries, a minor allele frequency histogram, the Ts/Tv ratio, and the raw count of SNPs. Additional information is output for barley samples. Variant_Analysis depends on VCFtools, vcflib, molpopgen, Python3, GNU Parallel, BCFtools, R, TeX Live, and the Enthought Python Distribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly