The ultra-long ONT sequencing technology benefits metagenomic profiling with high alignment specificity. Yet, its high sequencing error per read remains a hurdle to distinguish among closely related pathogens at lower taxonomic ranks, and for refined drug-level antimicrobial resistance prediction. In this study, we present MegaPath-Nano, successor to the NGS-based MegaPath, an accurate compositional analysis software with drug-level AMR identification for ONT metagenomic sequencing data. MegaPath-Nano takes ONT raw reads as input, and performs data cleansing, taxonomic profiling, and drug-level AMR detection within a single workflow. The major output of our tool includes 1) a taxonomic profiling report down to strain level with abundance estimated; and 2) an integrated class and drug level AMR report in tabular format with supportive information from different detection tools. As a key feature for taxonomic profiling, MegaPath-Nano performs a global-optimization on multiple alignments and reassigns predictably misplaced reads to a single most likely species. To perform a consistent and comprehensive AMR detection analysis, MegaPath-Nano uses a novel consensus-based approach to detect AMR, incorporating a collection of AMR software and databases. We benchmarked against other state-of-the-art software, including WIMP, Kraken 2, MetaMaps, ARMA and ARGpore using real sequencing data, and we achieved the best performance in both tasks. MegaPath-Nano is therefore a well rounded ONT metagenomic tool for clinical use in practice.
Storage requirement: 80G
# prioritize channels
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda create -n mpn -c bioconda megapath-nano
conda activate mpn
# prioritize channels
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda create -n mpn python=3.6.10
conda activate mpn
# installing all dependencies for both modules
conda install pandas psutil pybedtools porechop==0.2.4 bioconvert seqtk minimap2 bcftools samtools==1.9 'pysam>=0.16.0' tabulate cgecore==1.5.6 "ncbi-amrfinderplus>=3" "rgi>=5"
# MegaPath-Nano-Amplicon filter module
conda install clair=2.1.1 parallel=20191122
# git clone MegaPath-Nano
git clone --depth 1 /~https://github.com/HKU-BAL/MegaPath-Nano
# MegaPath-Nano-Amplicon filter module
cd MegaPath-Nano/bin/realignment/realign/
g++ -std=c++14 -O1 -shared -fPIC -o realigner ssw_cpp.cpp ssw.c realigner.cpp
g++ -std=c++11 -shared -fPIC -o debruijn_graph -O3 debruijn_graph.cpp
gcc -Wall -O3 -pipe -fPIC -shared -rdynamic -o libssw.so ssw.c ssw.h
cd -
cd MegaPath-Nano/bin/Clair-ensemble/Clair.beta.ensemble.cpu/clair/
g++ ensemble.cpp -o ensemble
cd -
cd MegaPath-Nano/bin/samtools-1.13
./configure && make && make install
sudo docker build -f ./Dockerfile -t mpn_image .
sudo docker run -it mpn_image /bin/bash
# Option 1, Bioconda: cd ${CONDA_PREFIX}/MegaPath-Nano
# conda info --env can show the ${CONDA_PREFIX} in the current environment.
# Option 2, Conda Virtual Env: cd ./MegaPath-Nano (the git clone)
# Option 3, Docker: cd /opt/MegaPath-Nano
cd ${MEGAPATH_NANO_DIR}
# Taxon
wget -c http://www.bio8.cs.hku.hk/dataset/MegaPath-Nano/MegaPath-Nano_db.v1.0.tar.gz -O - | tar -xvz
# AMR
rgi load --card_json bin/amr_db/card/card.json
amrfinder -u
# Amplicon filter module
wget -c http://www.bio8.cs.hku.hk/dataset/MegaPath-Nano/MegaPath-Nano-Amplicon_db.v1.0.tar.gz -O - | tar -xvz
The latest RefSeq database can be downloaded with the scripts under db_preparation/.
# Taxon
# download RefSeq:
./refseq_download.sh [${DB_DIR}=MegaPath-Nano/genomes/refseq/]
# build assembly metadata:
./updateAssemblyMetadata.sh [${DB_DIR}=MegaPath-Nano/genomes/refseq/] [${ASSEMBLY_DIR}=MegaPath-Nano/genomes/]
# generate config files:
./updateConfigFile.sh [${DB_DIR}=MegaPath-Nano/genomes/refseq/] [${CONFIG_DIR}=MegaPath-Nano/config/]
# prepare SQL db data:
./updateDB.sh [${DB_DIR}=MegaPath-Nano/genomes/refseq/] [${SQL_DIR}=MegaPath-Nano/db/]
# (optional) add custom FASTA sequences to the decoy database
python addDecoyDB.py --decoy_fasta ${fasta}
# AMR
# prepare AMR databases:
./prepareAMR_DB.sh
(1) Run taxonomic analysis and AMR deteciton module
python megapath_nano.py --query ${fq/fa} [options]
required arguments:
--query
Query file (fastq or fasta)
optional arguments:
--max_aligner_thread INT Maximum number of threads used by aligner, default: 64. Actual number of threads is min( available num of cores, threads specified)
--output_prefix Output Prefix, default: query file name
--output_folder Output folder, default: current working directory
(2) Run taxonomic analysis module only
python megapath_nano.py --query ${fq/fa} --taxon_module_only [options]
(3) Run AMR deteciton module only with FASTQ/FASTA
python megapath_nano.py --query ${fq/fa} --AMR_module_only [options]
(4) Filter FQ/FA only: Adaptor trimming, read filtering and trimming, human or decoy filtering
python megapath_nano.py --query ${fq/fa} --filter_fq_only [options]
For all available options, please check Usage.md
(5) Run AMR deteciton module only with BAM
python megapath_nano_amr.py --query_bam ${bam} --output_folder ${dir} [options]
required arguments:
--query_bam QUERY_BAM
Input bam
--output_folder OUTPUT_FOLDER
Output directory
optional arguments:
--taxon TAXON Taxon-specific options for AMRFinder [e.g. --taxon Escherichia], see usage for the full list of curated organisms
--threads THREADS Max num of threads, default: available num of cores
(6) Run amplicon filter module with **FASTQ**
./MegaPath-Nano/bin/runMegaPath-Nano-Amplicon.sh -r ${fq}
The demo data for AMR detection of five patient isolates are available for download on http://www.bio8.cs.hku.hk/dataset/MegaPath-Nano/. Samples were prepared using ONT Rapid Sequencing Kit, and sequenced using ONT R9.4.1 flowcells.
The experimental validation results of these AMR demo datasets are listed on Supplementary_info_AMR.
wget http://www.bio8.cs.hku.hk/dataset/MegaPath-Nano/Escherichia_coli_isolate2_HKUBAL_20200103.fastq
python megapath_nano.py --query Escherichia_coli_isolate2_HKUBAL_20200103.fastq