-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathTutorial_statistical_analysis.Rmd
72 lines (63 loc) · 2.42 KB
/
Tutorial_statistical_analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
title: "Tutorial_statistical_analysis"
author: "Lukas Simon"
date: "9 Mai 2018"
output: html_document
---
This R code contains a short tutorial for the statistical analysis of the MetaMap output.
First, we will reduce the MetaMap dataset to project SRP066090 (Zheng et al.) and run differential metafeature abundance analysis reproducing parts of Figure 2 of the manuscript.
Load R libraries.
```{r warning=FALSE}
suppressMessages(library(Matrix))
suppressMessages(library(DESeq2))
suppressMessages(library(ggrepel))
suppressMessages(library(ggplot2))
```
Load the MetaMap data.
```{r}
load("data/MetaMapData.RData")
```
Restrict data to SRA study SRP066090.
```{r}
ok <- which(sample_info$study == 'SRP066090')
sample_info <- sample_info[ok,]
meta_count <- meta_count[,ok]
```
Define the two treatment groups.
```{r}
treat <- rep("positive", ncol(meta_count))
treat[grep("Neg", sample_info$sample_attribute)] <- "negative"
```
Run differential metafeature abundance analysis using DESeq2.
```{r}
des <- DESeqDataSetFromMatrix(countData = round(meta_count), colData = data.frame(treat = treat), design = ~ treat)
sizeFactors(des) <- log10(sample_info$spots) / median(log10(sample_info$spots))
res <- results(DESeq(des), contrast = c('treat', 'positive', 'negative'))
```
Generate differential expression volcano plot. The red dot highlights the human alpha papilloma virus (HPV), which is the most significant metafeature.
```{r}
tmp <- data.frame(res)
tmp <- tmp[sort.list(tmp$pvalue),]
tmp <- tmp[which(!is.na(tmp$log2FoldChange)),]
speciesNames.zhang <- meta_info[match(rownames(tmp), meta_info[,'TaxID']), 'Species']
makeVolcano <- function(res, nom){
ggplot(tmp, aes(log2FoldChange, -log10(pvalue))) +
geom_point(color = rgb(0,0,0,0.5)) +
geom_point(data = tmp[1, ], aes(x=log2FoldChange, y=-log10(pvalue)), color="red") +
ggtitle(nom) +
theme_classic(base_size = 16) +
labs(x = "Fold change (log2)", y = "-log10 p-value")
}
makeVolcano(tmp, nom = 'Zhang et al')
```
Now, we can plot the HPV levels by treatment.
```{r}
abundance <- t(t(meta_count + 1) / sample_info$spots) * 1e6
species <- abundance[which(rownames(abundance) == "337041"),]
aframe <- data.frame(Alphapapillomavirus.9 = species, treat)
ggplot(aframe, aes(treat, Alphapapillomavirus.9)) + geom_boxplot() + geom_jitter(width = 0.2) + theme(axis.title.x=element_blank()) + theme(axis.text.x = element_text(angle = 90, hjust = 1))
```
Display session info.
```{r}
sessionInfo()
```