Introduction: Quick Start, Tutorial, Flowchart, Outputs structure
Install: Dependencies, Containers, References, Test datasets
Inputs: Data, Design, Parameters
1. Preprocessing: ATAC reads, ATAC peaks, mRNA
2. Differential Analysis: ATAC, mRNA, Split
3. Enrichment Analysis: Enrichment, Figures, Tables

List of processes

DA_split__splitting_differential_analysis_results_in_subsets
DA_split__plotting_venn_diagrams

DA_split__splitting_differential_analysis_results_in_subsets

Sub-workflow showing the creation of DASs.
Dotted arrows indicate optional additional filters. Abbreviations: FDR - False Discovery Rate, DAR - differentially accessible region, prom - promoter, distNC - distal non-coding region.

Diagrams showing how differential analysis results are split by experiment type and fold change filters.
Panel (a) shows the color code used in all panels, with blue circles representing genomic regions (either DARs or promoters of DEGs) and black circles representing gene sets (either the closest genes of DARs or DEGs). Enrichment of internal GRs and GSs indicates enrichment of GRs and GSs (i.e., DASs) in other GRs and GSs generated by the pipeline. Panels (b-g) show all possible splits of differential analysis results by experiment type (ATAC-Seq – turquoise, mRNA-Seq – orange, or both ATAC-Seq and mRNA-Seq – purple) and by fold change type (up – yellow, or down – green), with either an increase (b) or a decrease (c) in chromatin accessibility, an increase (d) or a decrease (e) in gene expression, and an increase (f) or a decrease (g) in both chromatin accessibility and gene expression. The HA-HE and HA-LE terminology has been previously described in (Nair et al., 2021). Black lines and blue circles represent DNA and nucleosomes, respectively. Orange lines represent mRNA molecules.

Description

This process splits Differential Analysis results into subsets (i.e., DAS - Differential Analysis Subsets) in order to do enrichment analysis on many different angles and extract the most information out of the data.
4 filters are used to split:

ET: Experiment Type. Can be either 'ATAC', 'mRNA', 'both', 'both_ATAC', or 'both_mRNA'.
PA: DAR Peak Annotation. Can be any combination of 'all', 'gene', 'interG', 'prom', '5pUTR', '3pUTR', 'exon', 'intron', 'downst', 'distIn', 'UTR', 'TSS', 'genPro', 'distNC', 'mt10kb', 'mt100kb', 'mtYkb', 'lt10kb', 'lt100kb', 'ltXkb'. See DA_ATAC__saving_detailed_results_tables for details. 'all' disable this filters (all peaks are included).
FC: Fold Change type. To split up and down-regulated results.
TV: Theshold Value(s). To split results by significance thresholds.

NOTE: The 'both*' entries indicates that the results pass the filters in both ATAC-Seq and mRNA-Seq. 'both' is used for gene lists (i.e. to find enriched ontologies), while 'both_ATAC' and 'both_mRNA' are used for genomic regions (i.e. to find enriched motifs/CHIP). 'both_ATAC' are ATAC-Seq peaks assigned to genes that are passing the filters in mRNA-Seq data as well. 'both_mRNA' are gene promoters of genes that pass the filters in mRNA-Seq and for which there are nearby ATAC-Seq peaks assigned to the same gene and that pass the filters.

NOTE: The process merges mRNA-Seq and ATAC-Seq results if experiment_types = 'both' otherwise it works on either of the two.

Finally, a key is made, of the form ${ET}__${PA}__${FC}__${TV}__${COMP}, with COMP indicating the comparison. This key is used to make:

bed files that contain genomic regions (i.e. to find enriched motifs/CHIP)
R files that contain gene sets (i.e. to find enriched ontologies, for Venn diagrams plots).

In additions, two types of tables are produced: res_simple and res_filter. These two tables contain the same columns: the 5 key components (ET, PA, FC, TV and COMP), a peak_id column (Null for mRNA-Seq results), chromosome, gene name and id, pvalue and adjusted p-value and log2 fold changes. These two tables differ in their format:

res_simple: each result is reported with the filters that it passes that are combined with "|" (i.e PA: 'all|prom'). This allows to quickly browse all results. Please note that ET = 'both' entries are not shown in this file.
res_filter: only results passing filters are reported and each passed filter is on a different line (so 'all' and 'prom' would be on two different lines in the previous example). This file should be smaller as it exclude all the non-significant results. This file includes the entries significant in both ATAC-Seq and mRNA-Seq, with ET = 'both_ATAC' showing the ATAC-Seq results, and ET = 'both_mRNA' showing the mRNA-Seq results.

Parameters

params.split__threshold_type: Defines if the threshold cuttoff is based on FDR (adjusted p-value) or rank. Options: 'FDR', 'rank'. Default: 'FDR'.
params.split__threshold_values: Groovy list defining the threshold cuttoff value(s). If params.split__threshold_type = 'rank' all entries ranked below this value will be kept (with entries ranked from lowest (rank = 1) to highest adjusted pvalues). If params.split__threshold_type = 'FDR' all entries with a -log10(adjusted p-value) below this threshold will be kept. e.g., params.split__threshold_values = [ 1.3 ] will keep all entries with an adjusted pvalue below 0.05 (i.e., -log10(0.05) = 1.30103). Multiple thresholds can be added but from the same type (FDR or rank). Default: [ 1.3 ].
params.split__peak_assignment: Defines the peak assignment filters to use. See DA_ATAC__saving_detailed_results_tables for options. Default: [ 'all' ].
params.split__keep_unique_genes: Should only unique DA and NDA genes be kept for downstream analysis. Default: 'TRUE'.

Outputs

Gene lists: Processed_Data/2_Differential_Analysis/DA_split__genes_rds/${key}__genes.rds
Bed files: Processed_Data/2_Differential_Analysis/DA_split__bed_regions/${key}__regions.bed
Res simple:
- Tables_Individual/2_Differential_Analysis/res_simple/${comparison}__res_simple.{csv,xlsx}
- Tables_Merged/2_Differential_Analysis/res_simple.{csv,xlsx}
Res filter:
- Tables_Individual/2_Differential_Analysis/res_filter/${comparison}__res_filter.{csv,xlsx}
- Tables_Merged/2_Differential_Analysis/res_filter.{csv,xlsx}

DA_split__plotting_venn_diagrams

Description

This process takes as input all gene lists made by the previous process for a given comparison and generates venn diagrams for gene lists that share these keys: PA (DAR Peak Annotation), FC (Fold Change type) and TV (Theshold Value).
Two types of plots are made:

proportional two ways venn diagrams: ATAC-Seq vs mRNA-Seq with FC either up or down
fixed-size four-ways venn diagrams: ATAC-Seq vs mRNA-Seq with FC up and down. In these plots, mRNA-Seq data has an orange filling, ATAC-Seq data has a blue filling, up-regulated genes have a purple outside line and down-regulated genes have a green purple outside line.

Outputs

Two-ways venn diagrams:
- Figures_Individual/2_Differential_Analysis/Venn_diagrams__two_ways/${key}__venn_up_or_down.pdf
- Figures_Merged/2_Differential_Analysis/Venn_diagrams__two_ways.pdf

Four-ways venn diagrams:
- Figures_Individual/2_Differential_Analysis/Venn_diagrams__four_ways/${key}__venn_up_and_down.pdf
- Figures_Merged/2_Differential_Analysis/Venn_diagrams__four_ways.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split.md

Split.md

List of processes

DA_split__splitting_differential_analysis_results_in_subsets

Description

Parameters

Outputs

DA_split__plotting_venn_diagrams

Description

Outputs

Files

Split.md

Latest commit

History

Split.md

File metadata and controls

List of processes

DA_split__splitting_differential_analysis_results_in_subsets

Description

Parameters

Outputs

DA_split__plotting_venn_diagrams

Description

Outputs