-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
what does 'disagree' mean in output? #2772
Comments
great question! I vaguely remembered writing something about it somewhere, but had to go digging 😆 . And it was by no means easy to find! Here's what I wrote in this blog post: Interpreting the CSV fileThe CSV file has five columns: name, taxid, status, rank_info, and lineage. The There are three possible status values at present:
For example, look at this line in the CSV file:
TARA_ASW_MAG_00029 has k-mers that are shared between different orders: 'Pseudomonadales' and 'Rhodobacterales'. Therefore, the classifier status is make sense? thanks again for asking! Since you're using the LCA module, I also wanted to point you at this issue: #2760 Nowadays we suggest using If you have any more questions, ask away! And please do leave this issue open so I can update the docs appropriately. |
WOW, thanks a lot for the detailed answer! And thanks for pointing me to the gather > tax approach. For the record, I am very happy with the classification because the 'nomatch', if I try to blast them for example, they only have some very distant hits with some uncutured/unclassified isolates that nobody knows what they are :D |
… analysis (#2777) This PR adds cautionary notes to the command line docs, and updates the information on classifying signatures to suggest using tax instead of LCA, and even explains why :). There is more work to be done - we need to add more tutorials, and adjust the language in classifying-signatures around gather and LCA - but this is a nice standalone PR! Fixes #2562 Fixes #2772 Fixes #2773 Adds information from #2760 Addresses #2535
Apologies if this is sth trivial or has been asked before but I could not find an answer in this forum or by googling it.
I use sourmash to classify genomes/MAGs and while I understand in the output table the successful classification is marked as 'found' and the unsuccessful as 'nomatch' - what does the 'disagree' mean? I assume that it means that matches were found but based on the software's cutoff it is not very happy? The matches are not great? Is that it? if yes, can I still use the result? How reliable is it?
thanks
here is an example:
2023_1030076_1_MG_127_23112020_S0_L001bin.47.fa,disagree,d__Bacteria,p__Bacteroidota,c__Bacteroidia,o__Bacteroidales,f__Barnesiellaceae,g__Barnesiella_A,,
2023_1030076_1_MG_127_23112020_S0_L001bin.4.fa,nomatch,,,,,,,,
2023_1030076_1_MG_127_23112020_S0_L001bin.52.fa,found,d__Bacteria,p__Spirochaetota,c__Spirochaetia,o__Treponematales,f__Treponemataceae,g__Treponema_D,s__Treponema_D sp900767955,
2023_1030076_1_MG_127_23112020_S0_L001bin.57.fa,found,d__Bacteria,p__Verrucomicrobiota,c__Verrucomicrobiae,o__Verrucomicrobiales,f__Akkermansiaceae,g__Akkermansia,,
The text was updated successfully, but these errors were encountered: