bindz-rbp is a computational workflow which aims to predict binding sites of RNA-binding proteins in a given input RNA sequence, implemented in a snakemake pipeline 🐍
- General information
- Installation instructions
- Optional: Download and parse PWMs from ATtRACT database
- Workflow execution
- Contributing
- Contact
bindz-rbp predicts binding sites of distinct regulators in an RNA sequence by calculating posterior probabilities with MotEvo, given the sequence specificity of regulators, represented as position-specific weight matrices. It is intended to help in the analysis of individual reporter sequences, by predicting regulatory that may act on the sequence as well as how the binding may be affected by specific mutations introduced in the reporter sequences. The tools scans the input sequence with a set of position-specific weight matrices (PWMs) representing the binding specificity of individual RNA-binding proteins. The run time scales linearly with both the sequence length and with the number of PWMs, so please make sure to test it on your architecture before running it on batches of sequences.
The tool is implemented as a Snakemake workflow.
The main output of the pipeline are:
combined_MotEvo_results.tsv
: a tab-separated file which collects information related to all predicted binding sites of all analyzed motifs into one table.binding_sites.bed
: simplified list of binding sites in a BED format.ProbabilityVsSequence.pdf
: a visualisation of binding positions and probabilities in a form of a heatmap.
Snakemake is a workflow management system that helps to create and execute data processing pipelines. It requires Python 3 and can be most easily installed via the bioconda channel from the anaconda cloud service.
To install the latest version of miniconda please execute:
[Linux]:
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source .bashrc
[macOS]:
wget https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
bash Miniconda3-latest-MacOSX-x86_64.sh
source .bashrc
Cloning repositories requires git to be installed (available via conda
):
conda install git
Clone this git repository into a desired location (here: bindz-rbp in the current working directory ) with the following command:
git clone /~https://github.com/zavolanlab/bindz-rbp
To help the users in the installation process we have prepared a recipe for a conda virtual environment that contains all the software needed to run bindz-rbp. This environment can be created by the following script:
bash bindz-rbp/scripts/create-conda-environment-main.sh
The built conda environment may then be activated with:
conda activate bindz-rbp
Inside this repository we have included a snapshot of a database of Position Weight Matrices for distinct RNA binding proteins (ATtRACT: 26-08-2020). We suggest to use the pre-formatted files which we have already prepared: resources/ATtRACT_hsa
and resources/ATtRACT_mmu
for Homo sapiens and Mus musculus, respectively.
However, if the user would like to download and parse a new version of matrices from ATtRACT we describe the procedure below:
Please change directory to the pipeline's root directory:
cd bindz-rbp
To utilize position-specific weight matrices from the ATtRACT database of known RBPs' binding motifs we provide two scripts:
-
Download and extract the database into a directory
ATtRACT
underresources
:bash scripts/download-ATtRACT-motifs.sh -o resources/ATtRACT
-
Parse the database and reformat the PWMs into a TRANSFAC format (currently supported species are Homo_sapiens or Mus_musculus):
Homo sapiens
mkdir resources/ATtRACT/ATtRACT_hsa python scripts/format-ATtRACT-motifs.py \ --pwms resources/ATtRACT/pwm.txt \ --names resources/ATtRACT/ATtRACT_db.txt \ --organism Homo_sapiens \ --outdir resources/ATtRACT/ATtRACT_hsa
Mus musculus
mkdir resources/ATtRACT/ATtRACT_mmu python scripts/format-ATtRACT-motifs.py \ --pwms resources/ATtRACT/pwm.txt \ --names resources/ATtRACT/ATtRACT_db.txt \ --organism Mus_musculus \ --outdir resources/ATtRACT/ATtRACT_mmu
To print information about the script's arguments please type:
python scripts/format-ATtRACT-motifs.py --help
Please change directory to the pipeline's root directory:
cd bindz-rbp
All the input, output and parameters for the pipeline execution should be specified in a snakemake configuration file in YAML format. Such a file can be created based on our prepared template located at workflow/config/config-template.yml
. Assuming that the user created a config.yml
and saved it in the repository's root directory (and that it is the current working directory) the workflow can be executed on the local machine with:
snakemake \
--snakefile="workflow/Snakefile" \
--configfile="config.yml" \
--use-conda \
--cores=1 \
--printshellcmds \
--verbose
We also provide a integration test for the pipeline on a small input dataset to examine if the installation was successful:
bash tests/integration/execution/snakemake_local_run_conda_environments.sh
This project lives off your contributions, be it in the form of bug reports, feature requests, discussions, or fixes and other code changes. 🙂
Please refer to the contributing guidelines if you are interested to contribute. Please mind the code of conduct for all interactions with the community.
For questions or suggestions regarding the code, please use the issue tracker. For any other inquiries, please contact us by email: zavolab-biozentrum@unibas.ch 📨