- Introduction: Quick Start, Tutorial, Flowchart, Outputs structure
- Install: Dependencies, Containers, References, Test datasets
- Inputs: Data, Design, Parameters
- 1. Preprocessing: ATAC reads, ATAC peaks, mRNA
- 2. Differential Analysis: ATAC, mRNA, Split
- 3. Enrichment Analysis: Enrichment, Figures, Tables
This process produces a barplot showing the most significant results for the input DAS.
Target names are shortened and duplicate entries are removed. The top n most significant entries are kept. The figure made shows on the x-axis the size of the overlap and the term name on the y axis. Entries are sorted by adjusted pvalues (descending order) and overlap of DA results (ascending order). The x-axis title indicates the total number of entries in the DAS (all DA entries), and for genomic regions DASs (i.e. from bed files) the number of entries in the background (all NDA entries).
Adjusted p-values are signed with positive values for enrichment and negative values for depletion. The signed and binned adjusted p-values are cut into 11 bins by using 5 adjusted p-values cuttoff and their signed negative values in a log10 scale. On figures, enrichments are depicted in green and deplections are in purple.
Finally, it is possible to add additional colored point to the top of the bars that represent different values (params.barplots__add_var) and to write the overlap count (params.barplots__add_number).
-
params.save_barplots_rds: Should barplots be saved as rds object or not. Default: false.
-
params.common__{padj_bin_breaks,barplots_params,barplots_ggplot}: These parameters allow to set the same parameters to each enrichment categorie. There is one parameter for each enrichment category (e.g., params.common__barplots_params). If null this parameter is disabled, otherwise the value is used as the value to set up each parameter to. Default: null.
-
params.padj_bin_breaks__{genes_self,peaks_self,func_anno,chrom_states,CHIP,motifs}: A string converted to a vector in R containing the 5 adjusted p-value bins cutoff. There is one parameter for each enrichment category. Default: "c( 0.2, 0.05, 1e-5, 1e-20, 1e-100 )".
-
params.barplots_params__{genes_self,peaks_self,func_anno,chrom_states,CHIP,motifs}: A string converted to a vector in R containing options to customize the barplots. There is one parameter for each enrichment category. Default: "c( 0.05, T, 'none', F, 50, 30 )". The options are in order:
- padj_threshold: If no adjusted pvalue is above this threshold the process is stopped and no figure is made.
- signed_padj: Should enrichment and depletion be shown (T) or enrichment only (F).
- add_var: Add a variable to the plots as a small dot. Options: 'none' (nothing added; default), 'L2OR' (log2 odd ratio), 'ov_da' (overlap of DA entries with target; i.e. counts), 'padj_loglog' (pvalues in a log scale (higher values equals lower pvalues). formula:
log10(-log10(pval) + 1)
). - add_number: Write the number count on the plots.
- max_characters: The length limit of terms names.
- max_terms: Number of terms to display.
-
params.barplots_ggplot__{genes_self,peaks_self,func_anno,chrom_states,CHIP,motifs}: A string converted to a vector in R containing options to customize the appearance of the barplots by tweaking ggplot2 parameters. There is one parameter for each enrichment category. Default: "c( 11, 10, 7 )". The options are in order:
- axis_text_size: Axis text size.
- title_text_size: Title text size.
- legend_text_size: Legend text size.
- axis_text_size: Axis text size.
- Barplots:
Figures_Individual/3_Enrichment_Analysis/Barplots__${EC}/${key}__barplot.pdf
Figures_Merged/3_Enrichment_Analysis/Barplots__${EC}.pdf
.
Note: The key for this process is
${ET}__${PA}__${FC}__${TV}__${COMP}__{EC}
.
- Genes self:
- Peaks self:
- Functional annotations GO-BP:
- Functional annotations KEGG:
- Chromatin states:
- CHIP:
- Motifs:
This process takes as input all enrichment results for comparisons of a given group (as specified in the comparisons.tsv file, ${GRP}
key) and that share the same keys for ${ET}
(Experiment Type), ${PA}
(DAR Peak Annotation), ${TV}
(Threshold Value) and ${EC}
(Enrichment Category), filters the most relevant terms, and produces a heatmap.
The heatmap shows the selected terms on the y-axis and the comparisons with fold change type (COMP_FC
) on the x-axis with this format: ${condition_1} > ${condition_2}
when ${FC}
is up
and ${condition_1} < ${condition_2}
when ${FC}
is down
.
The order of the COMP_FC
entries on the x-axis, and on the y-axis for the peaks_self
, genes_self
enrichment categories is defined by comparison.tsv input file as well as the up_down_pattern parameter that can be set up within the params.heatmaps_params parameter (see next subsection below).
The order of the terms of the chrom_states
enrichment categories (chromatin states) is defined by the chromatin state group (as defined in the original publication).
All terms are shown for the peaks_self
, genes_self
and chrom_states
enrichment categories.
For the CHIP
, motifs
and func_anno
enrichment categories a function has been created to select terms of interest (see the params.heatmaps__df_filter_terms
parameter).
Briefly, this function first remove terms with similar names. Next, it selects the top shared terms (lowest median absolute pvalues accross COMP_FC
). Then, it selects the top terms for each COMP_FC
. After that, the terms with the lowest pvalues accross all COMP_FC
are selected to reach the wished number of terms. Finally, hierarchical clustering (with euclidian distance) is performed to order terms by similarity.
Cells are colored with signed and binned adjusted pvalues as described in the previous process and several options are available in both processes through the heatmaps_params parameter.
Note: The genes-self and peaks-self heatmaps are not always symmetrical. This is because the heatmaps shows the enrichment of entries from the left side into the entries on the bottom side, and thus the target (set to overlap with) and background (NDA: Entries Not in the Differential Analysis subset) are different (obs. one can look at results tables for examples on these calculations).
-
params.save_heatmaps_rds: Should heatmaps be saved as rds object or not. Default: false.
-
params.common__{padj_bin_breaks,heatmaps_params,heatmaps_ggplot,heatmaps_filter}: These parameters allow to set the same parameters to each enrichment categorie. There is one parameter for each enrichment category (e.g., params.common__barplots_params). If null this parameter is disabled, otherwise the value is used as the value to set up each parameter to. Default: null.
-
params.padj_bin_breaks: same argument as in the previous process.
-
params.heatmaps__seed: random seed for the selection of terms. Default: 38.
-
params.heatmaps_params__{genes_self,peaks_self,func_anno,chrom_states,CHIP,motifs}: A string converted to a vector in R containing options to customize the heatmaps. There is one parameter for each enrichment category. Default for
genes_self
andpeaks_self
: "c( 0.05, T, 'none', T, 50, 'UUDD', 0 )". Default forfunc_anno
,chrom_states
,CHIP
andmotifs
: "c( 0.05, T, 'none', F, 50, 'UUDD', 0 )". The options are in order:- padj_threshold: If no adjusted pvalue is above this threshold the process is stopped and no figure is made.
- signed_padj: Should enrichment and depletion be shown (T) or enrichment only (F).
- add_var: Add a variable to the plots as a small dot. Options: 'none' (nothing added; default), 'L2OR' (log2 odd ratio), 'ov_da' (overlap of DA entries with target; i.e. counts), 'padj_loglog' (pvalues in a log scale (higher values equals lower pvalues). formula:
log10(-log10(pval) + 1)
). - add_number: Write the overlap count on the cells.
- max_characters: The limit of target names length. Longer targt names are cut.
- up_down_pattern: The pattern of how Fold Changes are displayed. Options: "UDUD" (up, down, up, down...) or "UUDD" (up, up, ..., down, down ...).
- cell_text_size: Allows to control text size in the cells of the heatmap if the add_number parameter is set to true. If set to zero, the text size will be determined automatically by Cactus according to the number of comparisons on the heatmap.
-
params.heatmaps_ggplot__{genes_self,peaks_self,func_anno,chrom_states,CHIP,motifs}: A string converted to a vector in R containing options to customize the appearance of the heatmaps by tweaking ggplot2 parameters. There is one parameter for each enrichment category. Default: "c( 11, 10, 7 )". The options are in order:
- axis_text_size: Axis text size.
- title_text_size: Title text size.
- legend_text_size: Legend text size.
- axis_text_size: Axis text size.
-
params.heatmaps_filter__{func_anno,CHIP,motifs}: A string converted to a vector in R containing options to customize the selection of terms for the heatmaps. Such filtering parameters are only available for the
func_anno
,CHIP
andmotifs
enrichment categories. Default forfunc_anno
: "c( 26, 18, 8, F, 2, 'ward.D')". Default forCHIP
andmotifs
: "c( 40, 30, 10, T, 2, 'ward.D')". The options are in order:- n_total: Total number of terms to select. This number should be higher than or equal to
n_shared + n_unique
. If the former is true, then remaining slots are taken by conditions with the lowest pvalues accross allCOMP_FC
(with ties sorted randomly). - n_shared: Number of shared terms to select. Shared terms are defined as terms with the highest median absolute -log10 pvalue accross
COMP_FC
. - n_unique: Numbers of top terms to select.
top_N
is defined asn_unique / n_comp
(with n_comp being the number ofCOMP_FC
) rounded to the lower bound. Then for eachCOMP_FC
, thetop_N
terms with the lowest pvalues are selected. - remove_similar: If true (T) entries similar names will be removed. Similar names is defined as entries that are the same before the final underscore; i.e. FOXO_L1 and FOXO_L2. For each similar entry group, the lowest pvalue of each entry is computed and the top remove_similar_n entries with the lowest pvalue are kept.
- remove_similar_n: See n_shared above.
- agglomeration_method: Agglomeration method used for hierarchical clustering of selected terms on the y-axis. See here for options.
- select_enriched: Boolean indicating if only the most enriched terms should be selected (if TRUE/T) or the most enriched or depleted terms (if FALSE/F).
- n_total: Total number of terms to select. This number should be higher than or equal to
Figures_Individual/3_Enrichment_Analysis/Heatmaps__${EC}/${key}__heatmap.pdf
Figures_Merged/3_Enrichment_Analysis/Heatmaps__${EC}.pdf
.
Note: The key for this process is
${ET}__${PA}__${TV}__${GRP}__{EC}
,${GRP}
being the current group of comparisons.
- Genes self:
- Peaks self:
- Functional annotations GO-BP:
- Functional annotations KEGG:
- Chromatin states:
- CHIP:
- Motifs:
This process uses pdftk to merge pdf.
Note: Output files and path are specified in the process where they were created.