scGraphETM: : Graph-Based Deep Learning Approach for Unraveling Cell Type-Specific Gene Regulatory Networks from Single-Cell Multi-Omics Data
scGraphETM is a novel deep learning framework designed to uncover cell type-specific gene regulatory networks (GRNs) from single-cell multi-omics data. The model integrates scRNA-seq and scATAC-seq data through a hybrid architecture that combines three key components: (1) a Variational Autoencoder (VAE) with self-attention for dimensionality reduction and feature learning, (2) a Graph Neural Network (GNN) that leverages prior knowledge from gene regulatory networks, and (3) a linear decoder design that enhances interpretability. The model uniquely incorporates the xTrimoGene approach for generating cell-specific embeddings and employs GraphSAGE for efficient neighborhood aggregation in large-scale networks. Using an embedded topic model as the encoder, scGraphETM can simultaneously perform multiple tasks including cell type clustering, cross-modality imputation, and cell type-specific TF-RE relationship inference.
# Create conda environment
conda create -n scgraph python=3.10
conda activate scgraph
# Install PyTorch and PyTorch Geometric
conda install pytorch pytorch-cuda=11.8 -c pytorch -c nvidia
conda install pyg -c pyg
# Install other dependencies
pip install -r requirements.txt
- Python >= 3.10
- PyTorch >= 1.12.0
- PyTorch Geometric
- scanpy
- anndata
- pybedtools
project/
├── components/ # Model components
├── data/ # Data directory
├── model.py # Main model implementation
├── train.py # Training script
├── impute_main.py # Imputation script
└── eval_grn.py # GRN evaluation script
python train.py \
--date 2025-02-05 \
--use_gnn \
--model_name base_model
python impute_main.py \
--date 2025-02-05 \
--model_name base_model \
--plot_path ./result/imputation \
--use_gnn \
python eval_grn.py \
--date 2025-02-05 \
--model_name base_model \
--chipseq_file path/to/chipseq_data.bed \
--cell_type CD4 \
--use_gnn \
result/
├── PBMC/
│ └── 2025-02-05/
│ ├── figure/ # Training visualizations
│ ├── imputation/ # Imputation results
│ └── grn_eval/ # GRN evaluation results
└── weights/
└── PBMC/
└── best_model_2025-02-05_base_model.pth
-
RNA-seq Data
- Format: h5ad (AnnData)
- Required columns:
- obs: 'cell_type', 'batch'
- var: 'highly_variable'
-
ATAC-seq Data
- Format: h5ad (AnnData)
- Required columns:
- obs: 'cell_type', 'batch'
- var: 'chrom', 'chromStart', 'chromEnd'
-
ChIP-seq Data (for GRN evaluation)
- Format: BED
- Required columns: chrom, start, end, name
data/
├── PBMC/
│ ├── RNA_count.h5ad # Raw RNA count matrix
│ ├── ATAC_count.h5ad # Raw ATAC count matrix
│ ├── hvg_only_rna_count.h5ad # Filtered RNA data
│ └── hvg_tg_re_nearby_dist_1000000_ATAC.h5ad # Processed ATAC data
└── node2vec/
└── PBMC_model_512_400.pt # Pre-trained embeddings
- PBMC (Peripheral Blood Mononuclear Cells) 10X Genomics
- BMMC (Bone Marrow Mononuclear Cells) GSE194122 GSE204684
- Cerebral Cortex GSE204684
- CisTarget
- Chip-Seq Atlas
Each dataset should have matching cells between RNA-seq and ATAC-seq modalities.