Skip to content

Latest commit

 

History

History
196 lines (132 loc) · 12.7 KB

ATAC_peaks.md

File metadata and controls

196 lines (132 loc) · 12.7 KB

List of processes

Preprocessing

ATAC_peaks__calling_peaks

Description

Inputs are reads in bed files that are 1 base pair long (the 5' end), and that have been adjusted for the shift of the transposase.
Peaks are called with MACS2. The macs2 callpeak function is called with these arguments: -f BED -nomodel --shift -75 --extsize 150 --call-summits.
The --call-summits argument allows to call multiple summits for each peaks. This allows to split peaks in the next process.
The --extsize 150 argument allows to extend the reads by 150 bp in the 3' direction. This way fragments are centered in the original 5' location.
The combination of the --shift and --extsize arguments are often recommended for ATAC-Seq data (see these links for references and explanations: 1, 2, 3, 4).
We use a size of 150 base pair as it is approximately the size of a nucleosome.

Note: the two ends of each read pair are analyzed separately as they both provide valuable and separate information. See here.

Parameters

  • params.macs2__qvalue: q-value (minimum FDR) cutoff to call significant peaks with macs2. Default: '5e-2'.

Outputs

  • Raw peaks: Processed_Data/1_Preprocessing/ATAC__peaks__raw/${sample}__macs2_peaks.narrowPeak if params.save_bed_type = 'all'.

ATAC_peaks__splitting_multi_summits_peaks

Description

MACS2 peaks with multiple summits are split, with a boundary set in the middle of neighboring summits.
The script from this process was written by Aaron C. Daugherty for the first ATAC-Seq paper in C. elegans.

Outputs

  • Split peaks: Processed_Data/1_Preprocessing/ATAC__peaks__split/${sample}__split_peaks.narrowPeak if params.save_bed_type = 'all'.

ATAC_peaks__removing_blacklisted_regions

Description

Any peak that has any overlap with a blacklisted region is discarded.

Outputs

  • Kept and discarded peaks if params.save_bed_type = 'all':
    • ${sample}__peaks_kept_after_blacklist_removal.bed
    • ${sample}__peaks_lost_after_blacklist_removal.bed.

Output folders

  • Processed_Data/1_Preprocessing/ATAC__peaks__split__no_BL.

ATAC_peaks__removing_input_control_peaks

Description

If an input control is included in the experiment, and params.use_input_control is true, then peaks overlapping with input control peaks are removed.

Parameters

  • params.input_control_overlap_portion: threshold of the fraction of overlapping input control peaks to remove peaks. The percentage is regarding the treatment/sample peaks, not the input control peaks. Default: 0.2.

Outputs

  • Kept and discarded peaks if params.save_bed_type = 'all':
    • ${sample}__peaks_kept_after_input_control_removal.bed
    • ${sample}__peaks_lost_after_input_control_removal.bed.

Output folders

  • Processed_Data/1_Preprocessing/ATAC__peaks__split__no_BL_input.

ATAC_peaks__removing_specific_regions

Description

This process takes as input peaks from a comparison and remove any peaks that are in a region to remove for any of the two sample_id to compare. This is mostly useful for RNAi experiments that induce a large signal in ATAC-Seq, but it can be used to remove any arbitrary region.

Note: the reason why this process is here and not upstream (before aggregating replicates and comparisons) is because we want to remove in all bed files the peaks that are in specific regions (i.e. RNAi) that we want to avoid. This is because, Diffbind (i.e. the Differential Binding Analysis (DBA) tool) will consider all peaks for his analysis (i.e. differential abundance for ATAC-Seq), so if we remove one such peak in just one of the two samples to compare, if it is still present in the other sample then it will be included in the analysis and it will likely be found as differential bound during the DBA. e.g.: let's say we compare daf-16 RNAi vs control. If there is a MACS2 peak at daf-16 in one of the control condition's replicate, then even if we remove this peak in the daf-16 RNAi condition's replicates, it will still be included in the final analysis.

Parameters

  • params.design__regions_to_remove: path to the file containing the regions to remove (see the Design section for details). Default: 'Design/regions_to_remove.tsv'.

Outputs

  • Kept and discarded peaks if params.save_bed_type = 'all' or 'last':
    • ${sample}__peaks_kept_after_specific_regions_removal.bed
    • ${sample}__peaks_lost_after_specific_regions_removal.bed.

Output folders

  • Processed_Data/1_Preprocessing/ATAC__peaks__split__no_BL_input_RNAi.

Quality Controls

ATAC_QC_peaks__computing_and_plotting_saturation_curve

Description

A saturation curve is made to help assessing if samples have been sequenced deep enough (see here for details on saturation curve analysis).
MACS2 is used to sample reads (1 base pair reads in bed format) from 10% to 100%, by 10% increase.
Then, peaks are called with MACS2 for each sample. Finally, a plot is made in R showing the number of peaks (y-axis) by sequencing depth (x-axis).

Parameters

  • params.do_saturation_curve: enable or disable this process. Default: true.

Outputs

  • Saturation curves:
    • Figures_Individual/1_Preprocessing/ATAC__peaks__saturation_curve/${sample}__saturation_curve.pdf
    • Figures_Merged/1_Preprocessing/ATAC__peaks__saturation_curve.pdf.

ATAC_QC_peaks__annotating_macs2_peaks

Description

Peaks are annotated with ChIPseeker. Each peak is assigned to its closest gene using the annotatePeak function.

Parameters

  • params.do_raw_peak_annotation: to enable or disable this process. Default: true. Parameters of the annotatePeak function:
  • params.chipseeker__promoter_up: promoter start; upstream from TSS site. Default: 1500.
  • params.chipseeker__promoter_down: promoter end; downstream from TSS site. Default: 500.
  • params.chipseeker__overlap: this parameter together with the params.chipseeker__ignore_overlap controls the genes to which peaks are assigned to. If params.chipseeker__overlap equals "all" and params.chipseeker__ignore_overlap equals 'FALSE' then if a peak overlaps to a genomic feature (i.e., exon, intron, 5'UTR, 3'UTR, CDS) it will be assigned to this gene. Otherwise, the peak will be assigned to the neighboring gene regardless of overlap with genomic features. Options: "all", "TSS". Default: 'all'.
  • params.chipseeker__ignore_overlap: this parameter together with the params.chipseeker__overlap controls the genes to which peaks are assigned to. If params.chipseeker__overlap equals "all" and params.chipseeker__ignore_overlap equals 'FALSE' then if a peak overlaps to a genomic feature (i.e., exon, intron, 5'UTR, 3'UTR, CDS) it will be assigned to this gene. Otherwise, the peak will be assigned to the neighboring gene regardless of overlap with genomic features. Options: "all", "TSS". Default: 'FALSE'.
  • params.chipseeker__annotation_priority: This parameter controls the order of priorities when there are overlaping features that overlap with the peak for assigning a genomic region for the "annotation" column. Default: "c('Promoter', '5UTR', '3UTR', 'Exon', 'Intron', 'Downstream', 'Intergenic')".
  • params.chipseeker__ignore_upstream: If 'TRUE' only annotate gene at the 3' of the peak. Options: "FALSE", "TRUE". Default: 'FALSE'.
  • params.chipseeker__ignore_downstream: If 'TRUE' only annotate gene at the 5' of the peak. Options: "FALSE", "TRUE". Default: 'FALSE'.

Outputs

  • Annotated peaks R objects: Processed_Data/1_Preprocessing/ATAC__peaks__annotated_rds/${sample}__annotated_peaks.rds.

ATAC_QC_peaks__plotting_annotated_macs2_peaks_for_each_sample

Description

Using ChIPseeker and ggplot2 to plot coverage and average profile around TSS for each sample (one plot type per sample).

Parameters

  • params.chipseeker__promoter_up: promoter start; upstream from TSS site. Default: 1500.
  • params.chipseeker__promoter_down: promoter end; downstream from TSS site. Default: 500.

Outputs

  • Coverage plots:
    • Figures_Individual/1_Preprocessing/ATAC__peaks__coverage/${sample}__peaks_coverage.pdf
    • Figures_Merged/1_Preprocessing/ATAC__peaks__coverage.pdf.

  • Average profile plots:
    • Figures_Individual/1_Preprocessing/ATAC__peaks__average_profile/${sample}__average_profile.pdf
    • Figures_Merged/1_Preprocessing/ATAC__peaks__average_profile.pdf.

ATAC_QC_peaks__plotting_annotated_macs2_peaks_for_all_samples_grouped

Description

Using ChIPseeker and ggplot2 to plot coverage and average profile around TSS for all samples grouped (one plot type for all samples).

Parameters

  • params.chipseeker__promoter_up: promoter start; upstream from TSS site. Default: 1500.
  • params.chipseeker__promoter_down: promoter end; downstream from TSS site. Default: 500.

Outputs

Output folders

  • Figures_Individual/1_Preprocessing/ATAC__peaks__grouped_plots.
  • Figures_Merged/1_Preprocessing.