-
Notifications
You must be signed in to change notification settings - Fork 8
SAM_Processing
The SAM_Processing handler sorts, de-duplicates, adds read groups to, and merges the SAM files produced from Read_Mapping into one finished BAM file. This script utilizes Picard to carry out all processing of the SAM files. In addition, it creates before and after statistics using the flagstat
function of SAMTools. To run SAM_Processing, all common variables and handler-specific variables must be defined within the config file. Once the variables have been defined, SAM_Processing can be run with the following command:
sequence_handling SAM_Processing Config
Where Config is the full file path to the configuration file.
Future features: After the job has run, a list of sorted, deduplicated, and read grouped BAM files will be generated in addition to the merged BAM file.
The following are a list of variables that need to be defined within Config
. In addition to the handler-specific variables, all common variables must be defined.
Variable | Function | Method |
---|---|---|
METHOD | Which program should be used to process the SAM files. Choose from 'picard' (recommended) or 'samtools' . |
Picard and SAMtools |
SP_QSUB | QSub settings for batch submission. Recommended settings are "mem=12gb,nodes=1:ppn=8,walltime=24:00:00". | Picard and SAMtools |
MAPPED_DIRECTORY | The full file path to the directory containing the read-mapped samples. If using Read_Mapping then leave as "${OUT_DIR}/Read_Mapping" . |
Picard and SAMtools |
PICARD_JAR | The full file path for the Picard jar file. | Picard |
MAX_MEM | The most amount of memory that can be used, formatted like 15g. | Picard |
MAX_FILES | The maximum number of file handles that can be used. For UNIX systems, the per-process maximum number of files that can be open may be found with ulimit -n . Set slightly under this value. |
Picard |
TMP | An optional variable that tells Picard where to store temporary files. Only use if you've had issues running out of temp space. Otherwise, leave blank. | Picard |
SAM_Processing creates sorted, deduplicated BAM files that have read groups marked. In addition, it also generates a merged BAM file for other tasks such as variant calling, alignment statistics for all input SAM files, and the finished unmerged BAM files. Finally, a list of all finished unmerged BAM files is generated.
SAM_Processing depends on Picard and Java 1.8 for all processing needs as well as SAMTools 1.3.1 for generating the alignment statistics. In addition, PBS and GNU Parallel are required for basic running.
Next: Coverage_Mapping
- Getting Started
- Recommended Workflow
- Configuration
- Dependencies
- sample_list_generator.sh
- Slurm specific options
- Common Problems and Errors