Skip to content
/ RISC Public

A R package for integrated scRNA-seq data analysis using the RPCI algorithm

Notifications You must be signed in to change notification settings

bioinfoDZ/RISC

Repository files navigation

RISC

Overview

Integrated analysis of single cell RNA-sequencing (scRNA-seq) data from multiple batches or studies is often necessary in order to learn functional changes in cellular states upon experimental perturbations or cell type relationships in a developmental lineage. Here we introduce a new algorithm (RPCI) that uses the gene-eigenvectors from a reference dataset to establish a global frame for projecting all datasets, with a clear advantage in preserving genuine gene expression differences in matching cell types between samples, such as those present in cells at distinct developmental stages or in perturbated vs control studies. This R package “RISC” (Robust Integration of Sinlgle Cell RNA-seq data implements the RPCI algorithm, with additional functions for scRNA-seq analysis, such as clustering cells, identifying cluster marker genes, detecting differentially expressed genes between experimental conditions, and importantly outputting integrated gene expression values for downstream data analysis.

RISC v1.7 update, Mar 26, 2024

This version mainly solves the problems that are caused by the update of dependent package igraph. We also add clustering resolution parameters ('res') in InPlot and scCluster (for 'louvain' method) funcitons. Additionally, the mean expression and %s of expressing cells are included for both groups in the marker and differential expression results. Installtion note: dependent package “sparseMatrixStats” is in Bioconductor only (it cannot be installed by GitHub).

RISC v1.6 update

This version mainly solves the problems that are caused by dependent package updates

RISC v1.5 update

Changes from the last release (v1.0)
(1) Replace dependent "RcppEigen" with "RcppArmadillo", fully support sparse matrix in core functions.
(2) Replace dependent "pbmcapply" with "pbapply"
(3) Optimize "scMultiIntegrate" function and reduce memory-consuming; the new RISC release can support integration of datasets with >1.5 million cells and 10,000 genes.
(4) When data integration, all the genes expressed in indivudal datasets will be reserved in the integrated data. The genes shared expressed across samples will be labeled in "rowdata" of RISC object.
(5) Convert "logcounts" in the integrated RISC object "object@assay$logcount" from a large matrix to a list including multiple logcounts matrices, each corrected matrix for the corresponding individual data sets. To output full integrated matrix, mat0 = do.call(cbind, object@assay$logcount)
(6) Change function name "readscdata" -> "readsc"
(7) Change function name "read10Xgenomics" -> "read10X_mtx"
(8) Parameter names in some functions are changed.

Added new functions
(1) In "scMarker" and "AllMarker" functions, add Wilcoxon Rank Sum and Signed Rank model.
(2) In "scMarker", "AllMarker" and "scDEG" functions, add pseudo-cell (bin cells to generate meta-cells) option to detect marker genes.
(3) Add "slot" parameter in "DimPlot" function, external dimension reduction results can be added in RISC object, e.g. add phate results (phate0) to RISC object obj0@DimReduction$cell.phate = phate0; DimPlot(obj0, slot = "cell.phate", colFactor = 'Group', size = 2, label = TRUE)
(4) Add "read10X_h5" function for 10X Genomics h5 file.

Removed old functions
(1) delete "readHTSeqdata" function.

Install dependent packages:

install.packages(c("Matrix", "irlba", "doParallel", "foreach", "Rtsne", "umap", "MASS", "pbapply", "Rcpp", "RcppArmadillo", "densityClust", "FNN", "igraph", "RColorBrewer", "ggplot2", "gridExtra", "pheatmap", "hdf5r"))
BiocManager::install("sparseMatrixStats")

Install RISC:

install_github("/~https://github.com/bioinfoDZ/RISC.git")

The RISC package can also be downloaded and installed mannually Link

install.packages("/Path/to/RISC_1.7.tar.gz", repos = NULL, type = "source")

vignettes

Here we provide a vignette which shows the key steps in analyzing example scRNA-seq datasets from the basal or squamous carcinoma patients before and after anti-PD-1 therapy (GSE123813). Please also check the RISC functions for reading data directly from h5 files.

RISC v1.0 Link

RISC v1.6 Link

We also provide an example of how to convert a Seurat object to a RISC object (to use the new features, please reinstall RISC package), similarly one can convert a RISC object to a Seurat object.

RISC v1.0 Link

Notice, RISC v1.6 package is developed in R (v4.2.2), we test this vignette in the same R version.

Notice, RISC v1.7 package is developed in R (v4.3.3)

Contents:

(1) RISC package: "RISC_1.7.tar.gz"
(2) Vignette for GSE123813: "GSE123813_Vignette_RISC_v1.6.pdf"
(3) GSE123813 directory contains the information of cell-type, patients and treatment. file position, "/GSE123813/Raw_Data/bcc_annotation.tsv"

Old RISC version: "RISC_1.0.tar.gz" Old RISC version: "RISC_1.6.0.tar.gz"

Citation:

Liu Y, Tao W, Zhou B, Zheng D (2021) Robust integration of multiple single-cell RNA sequencing datasets using a single reference space. Nat Biotechnol 39(7):877-884.

About

A R package for integrated scRNA-seq data analysis using the RPCI algorithm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published