You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In sample MSVRNA3-HIV_S7 from run 210115_M04401, there is an alignment starting just 1 nucleotide before the end of vpr. Because there is nothing to align in amino acid space, there are no values in coord2conseq (the dictionary helping us translate conseq coordinates to reference coordinates), so the code fails in count_match when trying to find the maximum coordinate.
There are two options to solve this:
in the consensus aligner, in find_amino_alignments, check for the size of an alignment before adding it (we already check if it's larger than 0 for a match, we could check whether it's at least 3 nucleotides long), or
in the consensus aligner, in count_match, check whether the alignment is large enough to do anything.
I'd prefer to catch alignments that are too small as soon as possible (option 1), but option 2 might help catch other weird edge cases.
The text was updated successfully, but these errors were encountered:
Generally speaking, we can find the reading frame of an alignment, even if it is smaller than 3 nucleotides - we usually just round up to the nearest-larger integer number of amino acids and align. In this particular case, the error happened only because we were right at a region boundary and there was not enough sequence to align to.
I'm a little worried about very fragmented alignments if throw alignments of 1 or 2 nucleotides away - so I'm working on option 2 now, instead. I'm also double checking some cases with very bad alignments to see if we ever need these small alignments.
In sample MSVRNA3-HIV_S7 from run 210115_M04401, there is an alignment starting just 1 nucleotide before the end of vpr. Because there is nothing to align in amino acid space, there are no values in
coord2conseq
(the dictionary helping us translate conseq coordinates to reference coordinates), so the code fails incount_match
when trying to find the maximum coordinate.There are two options to solve this:
find_amino_alignments
, check for the size of an alignment before adding it (we already check if it's larger than 0 for a match, we could check whether it's at least 3 nucleotides long), orcount_match
, check whether the alignment is large enough to do anything.I'd prefer to catch alignments that are too small as soon as possible (option 1), but option 2 might help catch other weird edge cases.
The text was updated successfully, but these errors were encountered: