-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathREADME.Rmd
164 lines (123 loc) · 5.22 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
output:
github_document:
html_preview: false
editor_options:
chunk_output_type: console
bibliography: library.bib
---
```{r setup1, include=FALSE}
knitr::opts_chunk$set(fig.path = "man/figures/README_", warning = FALSE, message = FALSE, error = FALSE, echo = TRUE)
submission_to_cran <- TRUE
library(tidyverse)
```
# SCORPIUS
<!-- badges: start -->
[![R-CMD-check](/~https://github.com/rcannood/SCORPIUS/actions/workflows/R-CMD-check.yaml/badge.svg)](/~https://github.com/rcannood/SCORPIUS/actions/workflows/R-CMD-check.yaml)
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/SCORPIUS)](https://cran.r-project.org/package=SCORPIUS)
[![Codecov test coverage](https://codecov.io/gh/rcannood/SCORPIUS/branch/master/graph/badge.svg)](https://app.codecov.io/gh/rcannood/SCORPIUS?branch=master)
<!-- badges: end -->
SCORPIUS an unsupervised approach for inferring linear developmental chronologies from single-cell
RNA sequencing data. In comparison to similar approaches, it has three main advantages:
* **It accurately reconstructs linear dynamic processes.**
The performance was evaluated using a quantitative evaluation pipeline and
ten single-cell RNA sequencing datasets.
* **It automatically identifies marker genes, speeding up knowledge discovery.**
* **It is fully unsupervised.** Prior knowledge of the relevant marker genes or
cellular states of individual cells is not required.
News:
* See `news(package = "SCORPIUS")` for a full list of changes to the package.
* Our preprint is on [bioRxiv](https://biorxiv.org/content/early/2016/10/07/079509)
[@Cannoodt2016].
* Check out our [review](https://dx.doi.org/10.1038/s41587-019-0071-9) on Trajectory Inference methods
[@Saelens2019].
## Installing SCORPIUS
You can install:
* the latest released version from CRAN with
```R
install.packages("SCORPIUS")
```
* the latest development version from GitHub with
```R
devtools::install_github("rcannood/SCORPIUS", build_vignettes = TRUE)
```
If you encounter a bug, please file a minimal reproducible example on the [issues](/~https://github.com/rcannood/SCORPIUS/issues) page.
## Learning SCORPIUS
To get started, read the introductory example below, or read one of the vignettes containing more elaborate examples:
```{r vignettes, results='asis', echo=FALSE}
walk(
list.files("vignettes", pattern = "*.Rmd"),
function(file) {
title <-
read_lines(paste0("vignettes/", file)) %>%
keep(~grepl("^title: ", .)) %>%
gsub("title: \"(.*)\"", "\\1", .)
vignette_name <- gsub("\\.Rmd", "", file)
cat(
"* ", title, ": \n",
"`vignette(\"", vignette_name, "\", package=\"SCORPIUS\")`\n",
sep = ""
)
}
)
```
## Introductory example
```{r, echo=F}
set.seed(1)
```
This section describes the main workflow of SCORPIUS without going in depth in the R code. For a more detailed explanation, see the vignettes listed below.
To start using SCORPIUS, simply write:
```{r libraries, message=FALSE}
library(SCORPIUS)
```
The `ginhoux` dataset [@Schlitzer2015] contains 248 dendritic cell progenitors in one of three cellular cellular states: MDP, CDP or PreDC. Note that this is a reduced version of the dataset, for packaging reasons. See ?ginhoux for more info.
```{r load_ginhoux}
data(ginhoux)
expression <- ginhoux$expression
group_name <- ginhoux$sample_info$group_name
```
With the following code, SCORPIUS reduces the dimensionality of the dataset and provides a visual overview of the dataset.
In this plot, cells that are similar in terms of expression values will be placed closer together than cells with dissimilar expression values.
```{r reduce_dimensionality}
space <- reduce_dimensionality(expression, "spearman")
draw_trajectory_plot(space, group_name, contour = TRUE)
```
To infer and visualise a trajectory through these cells, run:
```{r infer_trajectory}
traj <- infer_trajectory(space)
draw_trajectory_plot(space, group_name, traj$path, contour = TRUE)
```
To identify candidate marker genes, run:
```{r find_tafs, message=F, warning=F}
# warning: setting num_permutations to 10 requires a long time (~30min) to run!
# set it to 0 and define a manual cutoff for the genes (e.g. top 200) for a much shorter execution time.
gimp <- gene_importances(
expression,
traj$time,
num_permutations = 10,
num_threads = 8,
ntree = 10000,
ntree_perm = 1000
)
```
To select the most important genes and scale its expression, run:
```{r calculate_pvalue, message=F, warning=F}
gimp$qvalue <- p.adjust(gimp$pvalue, "BH", length(gimp$pvalue))
gene_sel <- gimp$gene[gimp$qvalue < .05]
expr_sel <- scale_quantile(expression[,gene_sel])
```
```{r echo=F}
# reverse the trajectory. This does not change the results in any way,
# other than the heatmap being ordered more logically.
# traj <- reverse_trajectory(traj)
```
To visualise the expression of the selected genes, use the `draw_trajectory_heatmap` function.
```{r visualise_tafs, fig.keep='first'}
draw_trajectory_heatmap(expr_sel, traj$time, group_name)
```
Finally, these genes can also be grouped into modules as follows:
```{r moduled_tafs, fig.keep='first'}
modules <- extract_modules(scale_quantile(expr_sel), traj$time, verbose = F)
draw_trajectory_heatmap(expr_sel, traj$time, group_name, modules)
```
## References