CRABS vs rCRUX for custom reference databases #15
Replies: 1 comment 1 reply
-
From rCRUX manuscript CRUX and its counterparts, MetaCurator, Metaxa2, and CRABS, all similarly rely on a two-step database-generating process (Bengtsson-Palme et al., 2015; Curd et al., 2019; Jeunen et al., 2023; Richardson et al., 2020). First, these tools conduct an in silico PCR or analogous seed acquisition step to generate a set of “seed” sequences containing the primer regions. And since not all sequences are submitted with the primer sequences intact, these tools implement a second step which aligns these seed sequences across the large sequence repositories (e.g. GenBank, ENA, and BOLD) to acquire a comprehensive set of similar sequences. Inherently, these software tools take a brute force approach to generating reference databases that acquire all orthologous sequences and thus, unsurprisingly, require significant computational resources (Jeunen et al., 2023). "rCRUX databases consistently outperformed the partial RESCIPt, ecoPCR, CRABS, and MetaCurator databases. Thus, the greater diversity and breadth of species and accessions captured in rCRUX-generated reference databases provide an important tool for improving taxonomic classification." For example, the refdb R package provides a suite of complementary tools that can be used to merge BOLD and GenBank databases which could provide improved blast-formatted nucleotide databases (Keck et al., 2022). In addition, refdb provides a suite of tools to visualize and summarize output reference databases (Keck et al., 2022). Similar utilities to merge GenBank, EMBL, and BOLD databases are available through CRABS, MARES, RESCRIPt, and BAGS and can be used to generate a more comprehensive starting blastDB database, particularly for CO1 genes (Arranz et al., 2020; Fontes et al., 2021; Jeunen et al., 2023; Robeson et al., 2021). In addition, CRABS and MARES also provide tools to output datasets in a greater diversity of formats for use in additional taxonomic classifiers beyond Anacapa and Qiime2 (Bolyen et al., 2019; Curd et al., 2019). |
Beta Was this translation helpful? Give feedback.
-
Has anyone used the program CRABS (Creating Reference databases for Amplicon-Based Sequencing) to generate custom databases? According to the paper the workflow is very similar ot rCRUX, namely 1) downloading sequences from database including NCBI, BOLD, EMBL, and MitoFish (optional keyword search), 2) in silico PCR using primer sequences, 3) seeding a global pairwise alignment with the results from (2).
Beta Was this translation helpful? Give feedback.
All reactions