fastANI vs ANIm output differences #291
widdowquinn
started this conversation in
Ideas
Replies: 1 comment 5 replies
-
Re: IDEA 2 - the current "Coverage" for fastANI is (matching fragments)/(all fragments) - so it's bound to overestimate if a match is judged as anything less than the length of the fragment. If the current calculation is (frags1 * len)/(frags2 * len) then we can save two multiplication operations by going to (frags1)/(frags2). |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
As a first impression, it looks like there are systematic differences between fastANI and ANIm. We expect differences between the percentage identities, from the literature - so that's no surprise. It will be useful to compare methods quantitatively.
IDEA 1: a new graphical/report output for comparing values between two runs. Possible CLI:
I'm thinking this would write graphical and tabular comparison output with the difference between comparisons for each of the two runs, by default. Heatmap/tabular output with differences; CSV/tab file (tidy format) with pairwise comparisons and difference.
IDEA 2: there seems to be a systematic difference between reported coverage for fastANI and ANIm. We should investigate this. My first thought is that the kmer approach of fastANI collapses repeats (where ANIm preserves them) so the denominator in the coverage is proportionately smaller - we should check if this is the case.
Beta Was this translation helpful? Give feedback.
All reactions