RDscan is a snakemake workflow to find deletions and putative regions of difference (RDs) in mycobacterium tuberculosis complex (MTBC) genomes, it is also capable to determine already known or user defined RDs.
The usage of this workflow is described in the Snakemake Workflow Catalog, alternatively it can be installed as described below.
Use the Conda package manager and BioConda channel to install RDscan.
If you do not have conda installed do the following:
# Download Conda installer
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
# Set permissions
chmod -X Miniconda3-latest-Linux-x86_64.sh
# Install
bash Miniconda3-latest-Linux-x86_64.sh
Set up channels:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
Get RDscan snakemake workflow:
git clone /~https://github.com/dbespiatykh/RDscan.git
Install all required dependencies:
cd RDscan
conda install -c conda-forge mamba
mamba env create --file environment.yml
Activate RDscan environment:
conda activate RDscan
Run pipeline:
snakemake --conda-frontend mamba --use-conda -j {Number of cores}
It is recommended to use dry run if you are running pipeline for the first time, to see if everything is in working order, for this you can use -n
flag:
snakemake -n
Output in the results
directory will contain four tables: RD_putative.tsv
, RD_known.tsv
, RD_known.xlsx
, and RD_known.bin.tsv
Example of the RD_putative.tsv
:
Table containing all discovered putative RDs.
RD - Known RDs that intersects with deletion breakpoints; SIZE - Estimated size of predicted deletion.
Values in cells represent deletion length in the sample.
CHROM | START | END | SIZE | RD | TYPE | ERR015582 | ERR017778 | ERR017782 | ERR019852 |
---|---|---|---|---|---|---|---|---|---|
NC_000962 | 333828 | 338580 | 5800 | DEL | 7113 | 7084 | 7050 | ||
NC_000962 | 340400 | 340645 | 245 | DEL | |||||
NC_000962 | 350935 | 351175 | 238 | DEL | 300 | 204 | 240 | ||
NC_000962 | 361769 | 362988 | 1391 | DEL | 1833 | 1392 | 1833 | 1390 |
Example of the RD_known.tsv
:
Table containing proportion of coverage in particular RDs.
Sample | N-RD25_tbA | N-RD25_tbB | N-RD25bov/cap | N-RD25das |
---|---|---|---|---|
ERR015582 | 0.883562 | 0.856164 | 0.856164 | 0.808219 |
ERR017778 | 0 | 0 | 0 | 0.41791 |
ERR017782 | 1.021277 | 1.042553 | 1.106383 | 0.978723 |
ERR019852 | 0 | 0 | 0 | 0.386364 |
Example of the RD_known.xlsx
:
Same as the RD_known.tsv
, but in a XLSX format with applied contiditional formatting.
Conditional formatting corresponds with threshold value in a config.yml
file.
Binary version of the RD_known.bin.tsv
:
Sample | N-RD25_tbA | N-RD25_tbB | N-RD25bov/cap | N-RD25das |
---|---|---|---|---|
ERR015582 | 0 | 0 | 0 | 0 |
ERR017778 | 1 | 1 | 1 | 0 |
ERR017782 | 0 | 0 | 0 | 0 |
ERR019852 | 1 | 1 | 1 | 0 |
If you use RDscan
for your research, please cite the pipeline:
D. Bespiatykh, J. Bespyatykh, I. Mokrousov, and E. Shitikov, A Comprehensive Map of Mycobacterium tuberculosis Complex Regions of Difference, mSphere, Volume 6, Issue 4, 21 July 2021, Page e00535-21, https://doi.org/10.1128/mSphere.00535-21
All references for the tools utilized by the RDscan
can be found in the CITATIONS.md
file.