Skip to content

Latest commit

 

History

History
165 lines (112 loc) · 14.1 KB

Figures.md

File metadata and controls

165 lines (112 loc) · 14.1 KB

List of processes

Figures__making_enrichment_barplots

Description

This process produces a barplot showing the most significant results for the input DAS.

Target names are shortened and duplicate entries are removed. The top n most significant entries are kept. The figure made shows on the x-axis the size of the overlap and the term name on the y axis. Entries are sorted by adjusted pvalues (descending order) and overlap of DA results (ascending order). The x-axis title indicates the total number of entries in the DAS (all DA entries), and for genomic regions DASs (i.e. from bed files) the number of entries in the background (all NDA entries).

Adjusted p-values are signed with positive values for enrichment and negative values for depletion. The signed and binned adjusted p-values are cut into 11 bins by using 5 adjusted p-values cuttoff and their signed negative values in a log10 scale. On figures, enrichments are depicted in green and deplections are in purple.

Finally, it is possible to add additional colored point to the top of the bars that represent different values (params.barplots__add_var) and to write the overlap count (params.barplots__add_number).

Parameters

  • params.save_barplots_rds: Should barplots be saved as rds object or not. Default: false.

  • params.common__{padj_bin_breaks,barplots_params,barplots_ggplot}: These parameters allow to set the same parameters to each enrichment categorie. There is one parameter for each enrichment category (e.g., params.common__barplots_params). If null this parameter is disabled, otherwise the value is used as the value to set up each parameter to. Default: null.

  • params.padj_bin_breaks__{genes_self,peaks_self,func_anno,chrom_states,CHIP,motifs}: A string converted to a vector in R containing the 5 adjusted p-value bins cutoff. There is one parameter for each enrichment category. Default: "c( 0.2, 0.05, 1e-5, 1e-20, 1e-100 )".

  • params.barplots_params__{genes_self,peaks_self,func_anno,chrom_states,CHIP,motifs}: A string converted to a vector in R containing options to customize the barplots. There is one parameter for each enrichment category. Default: "c( 0.05, T, 'none', F, 50, 30 )". The options are in order:

    • padj_threshold: If no adjusted pvalue is above this threshold the process is stopped and no figure is made.
    • signed_padj: Should enrichment and depletion be shown (T) or enrichment only (F).
    • add_var: Add a variable to the plots as a small dot. Options: 'none' (nothing added; default), 'L2OR' (log2 odd ratio), 'ov_da' (overlap of DA entries with target; i.e. counts), 'padj_loglog' (pvalues in a log scale (higher values equals lower pvalues). formula: log10(-log10(pval) + 1)).
    • add_number: Write the number count on the plots.
    • max_characters: The length limit of terms names.
    • max_terms: Number of terms to display.
  • params.barplots_ggplot__{genes_self,peaks_self,func_anno,chrom_states,CHIP,motifs}: A string converted to a vector in R containing options to customize the appearance of the barplots by tweaking ggplot2 parameters. There is one parameter for each enrichment category. Default: "c( 11, 10, 7 )". The options are in order:

    • axis_text_size: Axis text size.
      • title_text_size: Title text size.
    • legend_text_size: Legend text size.

Outputs

  • Barplots:
    • Figures_Individual/3_Enrichment_Analysis/Barplots__${EC}/${key}__barplot.pdf
    • Figures_Merged/3_Enrichment_Analysis/Barplots__${EC}.pdf.

Note: The key for this process is ${ET}__${PA}__${FC}__${TV}__${COMP}__{EC}.

Examples

  • Genes self:

  • Peaks self:

  • Functional annotations GO-BP:

  • Functional annotations KEGG:

  • Chromatin states:

  • CHIP:

  • Motifs:

Figures__making_enrichment_heatmap

Description

This process takes as input all enrichment results for comparisons of a given group (as specified in the comparisons.tsv file, ${GRP} key) and that share the same keys for ${ET} (Experiment Type), ${PA} (DAR Peak Annotation), ${TV} (Threshold Value) and ${EC} (Enrichment Category), filters the most relevant terms, and produces a heatmap.

The heatmap shows the selected terms on the y-axis and the comparisons with fold change type (COMP_FC) on the x-axis with this format: ${condition_1} > ${condition_2} when ${FC} is up and ${condition_1} < ${condition_2} when ${FC} is down.

The order of the COMP_FC entries on the x-axis, and on the y-axis for the peaks_self, genes_self enrichment categories is defined by comparison.tsv input file as well as the up_down_pattern parameter that can be set up within the params.heatmaps_params parameter (see next subsection below).

The order of the terms of the chrom_states enrichment categories (chromatin states) is defined by the chromatin state group (as defined in the original publication).

All terms are shown for the peaks_self, genes_self and chrom_states enrichment categories.

For the CHIP, motifs and func_anno enrichment categories a function has been created to select terms of interest (see the params.heatmaps__df_filter_terms parameter). Briefly, this function first remove terms with similar names. Next, it selects the top shared terms (lowest median absolute pvalues accross COMP_FC). Then, it selects the top terms for each COMP_FC. After that, the terms with the lowest pvalues accross all COMP_FC are selected to reach the wished number of terms. Finally, hierarchical clustering (with euclidian distance) is performed to order terms by similarity.

Cells are colored with signed and binned adjusted pvalues as described in the previous process and several options are available in both processes through the heatmaps_params parameter.

Note: The genes-self and peaks-self heatmaps are not always symmetrical. This is because the heatmaps shows the enrichment of entries from the left side into the entries on the bottom side, and thus the target (set to overlap with) and background (NDA: Entries Not in the Differential Analysis subset) are different (obs. one can look at results tables for examples on these calculations).

Parameters

  • params.save_heatmaps_rds: Should heatmaps be saved as rds object or not. Default: false.

  • params.common__{padj_bin_breaks,heatmaps_params,heatmaps_ggplot,heatmaps_filter}: These parameters allow to set the same parameters to each enrichment categorie. There is one parameter for each enrichment category (e.g., params.common__barplots_params). If null this parameter is disabled, otherwise the value is used as the value to set up each parameter to. Default: null.

  • params.padj_bin_breaks: same argument as in the previous process.

  • params.heatmaps__seed: random seed for the selection of terms. Default: 38.

  • params.heatmaps_params__{genes_self,peaks_self,func_anno,chrom_states,CHIP,motifs}: A string converted to a vector in R containing options to customize the heatmaps. There is one parameter for each enrichment category. Default for genes_self and peaks_self: "c( 0.05, T, 'none', T, 50, 'UUDD', 0 )". Default for func_anno, chrom_states, CHIP and motifs: "c( 0.05, T, 'none', F, 50, 'UUDD', 0 )". The options are in order:

    • padj_threshold: If no adjusted pvalue is above this threshold the process is stopped and no figure is made.
    • signed_padj: Should enrichment and depletion be shown (T) or enrichment only (F).
    • add_var: Add a variable to the plots as a small dot. Options: 'none' (nothing added; default), 'L2OR' (log2 odd ratio), 'ov_da' (overlap of DA entries with target; i.e. counts), 'padj_loglog' (pvalues in a log scale (higher values equals lower pvalues). formula: log10(-log10(pval) + 1)).
    • add_number: Write the overlap count on the cells.
    • max_characters: The limit of target names length. Longer targt names are cut.
    • up_down_pattern: The pattern of how Fold Changes are displayed. Options: "UDUD" (up, down, up, down...) or "UUDD" (up, up, ..., down, down ...).
    • cell_text_size: Allows to control text size in the cells of the heatmap if the add_number parameter is set to true. If set to zero, the text size will be determined automatically by Cactus according to the number of comparisons on the heatmap.
  • params.heatmaps_ggplot__{genes_self,peaks_self,func_anno,chrom_states,CHIP,motifs}: A string converted to a vector in R containing options to customize the appearance of the heatmaps by tweaking ggplot2 parameters. There is one parameter for each enrichment category. Default: "c( 11, 10, 7 )". The options are in order:

    • axis_text_size: Axis text size.
      • title_text_size: Title text size.
    • legend_text_size: Legend text size.
  • params.heatmaps_filter__{func_anno,CHIP,motifs}: A string converted to a vector in R containing options to customize the selection of terms for the heatmaps. Such filtering parameters are only available for the func_anno, CHIP and motifs enrichment categories. Default for func_anno: "c( 26, 18, 8, F, 2, 'ward.D')". Default for CHIP and motifs: "c( 40, 30, 10, T, 2, 'ward.D')". The options are in order:

    • n_total: Total number of terms to select. This number should be higher than or equal to n_shared + n_unique. If the former is true, then remaining slots are taken by conditions with the lowest pvalues accross all COMP_FC (with ties sorted randomly).
    • n_shared: Number of shared terms to select. Shared terms are defined as terms with the highest median absolute -log10 pvalue accross COMP_FC.
    • n_unique: Numbers of top terms to select. top_N is defined as n_unique / n_comp (with n_comp being the number of COMP_FC) rounded to the lower bound. Then for each COMP_FC, the top_N terms with the lowest pvalues are selected.
    • remove_similar: If true (T) entries similar names will be removed. Similar names is defined as entries that are the same before the final underscore; i.e. FOXO_L1 and FOXO_L2. For each similar entry group, the lowest pvalue of each entry is computed and the top remove_similar_n entries with the lowest pvalue are kept.
    • remove_similar_n: See n_shared above.
    • agglomeration_method: Agglomeration method used for hierarchical clustering of selected terms on the y-axis. See here for options.
    • select_enriched: Boolean indicating if only the most enriched terms should be selected (if TRUE/T) or the most enriched or depleted terms (if FALSE/F).

Outputs

  • Figures_Individual/3_Enrichment_Analysis/Heatmaps__${EC}/${key}__heatmap.pdf
  • Figures_Merged/3_Enrichment_Analysis/Heatmaps__${EC}.pdf.

Note: The key for this process is ${ET}__${PA}__${TV}__${GRP}__{EC}, ${GRP} being the current group of comparisons.

Examples

  • Genes self:

  • Peaks self:

  • Functional annotations GO-BP:

  • Functional annotations KEGG:

  • Chromatin states:

  • CHIP:

  • Motifs:

Figures__merging_pdfs

Description

This process uses pdftk to merge pdf.

Note: Output files and path are specified in the process where they were created.