-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement stitching algorithm #1032
Conversation
6dc389b
to
5bd591e
Compare
Create a module for handling CIGAR strings and their related alignment formats. This includes functions for managing coordinate mapping between the query sequence and the reference sequence, as well as handling CIGAR strings. The added classes incorporate various methods to extend coordinates, convert them and translate them. It also includes a class for managing CIGAR hits, which includes functions to slice CIGAR operations, check for overlap, and converting operations to a multiple sequence alignment (MSA). This update helps to provide a more comprehensive set of tools for handling and interpreting CIGAR strings and alignments.
5bd591e
to
2a8aec8
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1032 +/- ##
==========================================
+ Coverage 86.42% 88.06% +1.64%
==========================================
Files 28 29 +1
Lines 6109 7150 +1041
==========================================
+ Hits 5280 6297 +1017
- Misses 829 853 +24 ☔ View full report in Codecov by Sentry. |
646736e
to
cde06c8
Compare
The updated function now creates a list with the same length as the input strings, initialized with zeros. The function then performs a moving window average comparison on the strings in both forward and reverse directions. This enhancement is designed to provide a more thorough and robust analysis of the sequence comparisons. Also add a docstring to it.
Instead of appending the newly stitched part to the end, prepend it at the start. This way we make sure that it will be processed on the next loop cycle.
d3c4aa7
to
0164ba6
Compare
If we assert that addition of cigar strings is commutative, then it is not associative. But for predictability of addition it is more important to have associativity.
8cd66d8
to
d3dd2ca
Compare
d3dd2ca
to
0f21847
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a big pull request, so I started with one module: cigar_tools. Overall, it looks pretty good, but I've started a few threads for discussion.
1add042
to
6ea286e
Compare
Also add a check for the text of the error messages. Co-authored-by: Don Kirkby <donkirkby@users.noreply.github.com>
6ea286e
to
8000704
Compare
b9ff3d3
to
f44e405
Compare
There seem to be quite a lot of unique checks that only PyCharm performs, and neither flake8 nor ruff have them. For example: grammar checks! This commit fixes all the errors that I've seen in the stitcher code, while browsing it in PyCharm.
4cad8cf
to
a770f1c
Compare
And similarly, rename contigs_stitched.csv to contigs.csv
b03415e
to
aba3d58
Compare
aba3d58
to
cb28178
Compare
0da7699
to
940b186
Compare
Use the unstitched versions of the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's all looking pretty good. I posted a few questions for you, along with some minor suggestions for you to consider.
Co-authored-by: Don Kirkby <donkirkby@users.noreply.github.com>
2358d28
to
acc110b
Compare
Closes #1030