-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
three or more apparent haplotypes at repeats #29
Comments
A mix of the two might be good; I dont see how the coordinates directly would map linearly without exaustively enumerating that space in the training, whcih seems unlikely given the sub1% errors. Detecting the repeated seq and requiring full overlap seems like it would waste a lot of knowledge, right? |
The coordinates would be relative to the window, or maybe to any underlying The learner can't seem to figure out that there is a repeat and the We could add a feature which was the length beyond a repeat at which the On Tue, May 31, 2016 at 3:46 PM Nicolás Della Penna <
|
By the way, we do retain information from the alignments to the graph, so On Tue, May 31, 2016 at 3:47 PM Erik Garrison erik.garrison@gmail.com
|
even the relative coordinates to make them linearly express this would have to be quadratic to the reference, right? marking the reads that dont overlap the locus seems like it is asking a lot of the linear learner, in articula,r here those are the oen with all the info and in other cases it is the oposite, im fraid it average out exposing the repeat structure seem important, agreed. |
In freebayes we can exclude these. In fact, although that isn't default there is discussion and support from folks like @chapmanb that we should do so by default as it improves performance. @nikete to explain: freebayes decides on a haplotype window over which it infers the genotype(s) for the samples in the analysis. In cases where there is an exact repeat or the sequence at the locus is a short repeat followed by low-complexity sequence, we use a haploytpe window long enough to reach one shannon per base ( The graph feature should be capturing even the stuff that doesn't fully overlap the locus. |
In the last case the graph feature doesn't help us because we've inappropriately broken the site into two. That's another problem that I find a bit confusing... I thought I'd resolved this as well but apparently not enough. |
It's not the right thing to do to call reference. At these examples we have non-reference genotypes. |
just as a note for the future: this will work well on 50X stuff, but for miniIOn or low coverage methods it might be better to not take them, to center we could use the middle window to bethe high entropy region |
A lot of our errors look like this:
But when we go to tview, we see that the problem. The reads match the reference, but only when they don't fully overlap the locus.
There are a few possible solutions to this.
Detect the repeated sequences and require full overlap (similar to freebayes).
Include the alignment start and end coordinates as a feature.
@nikete thoughts?
The text was updated successfully, but these errors were encountered: