-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement the stitcher #1030
Comments
Current version of the algorithm (ea58060) does work for simple cases. |
Current version of the algorithm (7a153c0) works on real-world examples, and produces expected results. Now it is about finding and fixing individual bugs. |
Currently working on the diagnostics. It turned out to be useful for finding bugs... Reordering goals. |
Part of the diagnostics is the visualizer (diagram maker) that is based on logs. It turned out to be almost as difficult to implement as the stitcher itself. The first version is implemented in 7e84f61 |
The task list in the issue description has been updated to better reflect the conceptual progress and milestones we've achieved as documented in our commits. |
The introduction of new stitcher changes contents and handling of some input/output files. Below is a breakdown: Same ContentThe following table lists files that have identical contents in both the old and new versions. The dash symbol (
Same RoleThe following table lists files that serve the same purpose in the pipeline across most use cases and within the proviral pipeline specifically:
|
The existing implementation of stitching has shown to produce nonsensical results in certain cases. The results from stitching should be a logical summation of its parts, but currently, they sometimes are not. The root cause appears to be the reliance on regions of the reference genome, rather than contigs produced by the assembler. And in cases when some regions have low concordance with the reference genome, they are aligned differently, producing conflicting versions of overlaps between them.
Objectives:
Tasks:
This includes determining how to handle single and multiple contig scenarios.
Treat cross-alignments as anomalies.test_correct_processing_of_two_overlapping_and_one_separate_contig_2.svg
.strand
parameter is checked every time an arrow is drawn in the visualizer.Make the oldcontigs.csv
file still produce the same output as before the stitcher by introducing a new output filecontigs_stitched.csv
to be used in downstream analyses.contigs_unstitched.csv
andremap_unstitched_conseq.csv
Notes:
This reimplementation may provide opportunities for simplification in the regions alignment code, which is currently very complex.
The text was updated successfully, but these errors were encountered: