Skip to content

Commit

Permalink
added seed length
Browse files Browse the repository at this point in the history
  • Loading branch information
lmdu committed Dec 5, 2024
1 parent 5311409 commit 417e70e
Show file tree
Hide file tree
Showing 7 changed files with 43 additions and 27 deletions.
14 changes: 7 additions & 7 deletions docs/api_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ pytrf.STRFinder

.. py:method:: as_list()
Put all SSRs in a list and return, each SSR in list has 7 columns including [sequence name, start position, end position, motif sequence, motif length, repeats, SSR length]
Put all SSRs in a list and return, each SSR in list has 7 columns including [sequence or chromosome name, start position, end position, motif sequence, motif length, repeat number, repeat length]

:return: all SSRs found

Expand Down Expand Up @@ -68,7 +68,7 @@ pytrf.GTRFinder

.. py:method:: as_list()
Put all GTRs in a list and return, each GTR in list has 7 columns including [sequence name, start position, end position, motif sequence, motif length, repeats, GTR length]
Put all GTRs in a list and return, each GTR in list has 7 columns including [sequence or chromosome name, start position, end position, motif sequence, motif length, repeat number, repeat length]

:return: all GTRs found

Expand Down Expand Up @@ -103,7 +103,7 @@ pytrf.ATRFinder

.. py:method:: as_list()
Put all ATRs in a list and return, each ATR in list has 14 columns including [sequence name, seed start position, seed end position, motif sequence, motif length, seed repeat, ATR start position, ATR end position, ATR repeat, ATR length, extend matches, extend substitutions, extend insertions, extend deletions, extend identity]
Put all ATRs in a list and return, each ATR in list has 16 columns including [sequence or chromosome name, start position, end position, motif sequence, motif length, repeat number, repeat length, seed start position, seed end position, seed repeat, seed length, extend matches, extend substitutions, extend insertions, extend deletions, extend identity]

pytrf.ETR
---------
Expand Down Expand Up @@ -146,7 +146,7 @@ pytrf.ETR

.. py:method:: as_list()
convert ETR object to a list
convert ETR object to a list, [sequence or chromosome name, start position, end position, motif sequence, motif length, repeat number, repeat length]

.. py:method:: as_dict()
Expand All @@ -158,7 +158,7 @@ pytrf.ETR

.. py:method:: as_string(separator='\t', terminator='')
convert ETR object to a TSV or CSV string by using separator and terminator
convert ETR object to a TSV or CSV string by using separator and terminator, columns: sequence or chromosome name, start position, end position, motif sequence, motif length, repeat number, repeat length

:param str separator: a separator between columns

Expand Down Expand Up @@ -241,7 +241,7 @@ pytrf.ATR

.. py:method:: as_list()
convert ATR object to a list
convert ATR object to a list, [sequence or chromosome name, start position, end position, motif sequence, motif length, repeat number, repeat length, seed start position, seed end position, seed repeat, seed length, extend matches, extend substitutions, extend insertions, extend deletions, extend identity]

.. py:method:: as_dict()
Expand All @@ -253,7 +253,7 @@ pytrf.ATR

.. py:method:: as_string(separator='\t', terminator='')
convert ATR object to a TSV or CSV string by using separator and terminator
convert ATR object to a TSV or CSV string by using separator and terminator, columns: sequence or chromosome name, start position, end position, motif sequence, motif length, repeat number, repeat length, seed start position, seed end position, seed repeat, seed length, extend matches, extend substitutions, extend insertions, extend deletions, extend identity

:param str separator: a separator between columns

Expand Down
4 changes: 3 additions & 1 deletion docs/changelog.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
Changelog
=========

Version 1.3.4 (2024-12-06)
Version 1.4.0 (2024-12-06)
--------------------------

- Added seed length to ATR object
- Changed the output column order of findatr
- Fixed tandem repeat overlap
- Fixed wraparound backtrace error

Expand Down
8 changes: 7 additions & 1 deletion docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -269,10 +269,12 @@ Find exact microsatellites or simple sequence repeats (SSRs) from fasta/q file.
options:
-h, --help show this help message and exit
-o , --out-file output file (default: stdout)
-f , --out-format output format, tsv, csv or gff (default: tsv)
-f , --out-format output format, tsv, csv, bed or gff (default: tsv)
-r mono di tri tetra penta hexa, --repeats mono di tri tetra penta hexa
minimum repeats for each STR type (default: 12 7 5 4 4 4)
The columns in findstr tsv, csv, bed output file: sequence or chromosome name, start position, end position, motif sequence, motif length, repeat number, repeat length.

Find exact generic tandem repeats (GTRs) from fasta/q file.

.. code:: sh
Expand All @@ -293,6 +295,8 @@ Find exact generic tandem repeats (GTRs) from fasta/q file.
-r , --min-repeat minimum repeat number (default: 3)
-l , --min-length minimum repeat length (default: 10)
The columns in findgtr tsv, csv, bed output file: sequence or chromosome name, start position, end position, motif sequence, motif length, repeat number, repeat length.

Find imperfect or approximate tandem repeats (ATRs)

.. code:: sh
Expand Down Expand Up @@ -322,6 +326,8 @@ Find imperfect or approximate tandem repeats (ATRs)
-x , --max-extend-length
maximum length allowed to extend (default: 2000)
The columns in findatr tsv, csv, bed output file: sequence or chromosome name, start position, end position, motif sequence, motif length, repeat number, repeat length, seed start position, seed end position, seed repeat number, seed length, number of matches, number of substitutions, number of insertions, number of deletions, extend alignment identity between imperfect repeat and its perfect counterpart.

Extract tandem repeat sequence and flanking sequence according results of findatr, findgtr or findstr.

.. code:: sh
Expand Down
2 changes: 1 addition & 1 deletion pytrfcli.py
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ def main():
default = 70,
metavar = '',
type = float,
help = "minimum identity from 0 to 100 (default: 70)"
help = "minimum identity for extending, 0 to 100 (default: 70)"
)

parser_atrfinder.add_argument('-x', '--max-extend',
Expand Down
31 changes: 17 additions & 14 deletions src/atr.c
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,18 @@ static PyObject* pytrf_atr_repr(pytrf_ATR *self) {
}

static PyObject* pytrf_atr_as_list(pytrf_ATR *self) {
return Py_BuildValue("OnnOiinnfiiiiif", self->seqid, self->sstart, self->send, self->motif,
self->mlen, self->srepeat, self->start, self->end, self->repeat,
self->length, self->matches, self->substitutions, self->insertions,
return Py_BuildValue("OnnOiiinnfiiiiif", self->seqid, self->start, self->end, self->motif,
self->mlen, self->repeat, self->length, self->sstart, self->send,
self->srepeat, self->slen, self->matches, self->substitutions, self->insertions,
self->deletions, self->identity);
}

static PyObject* pytrf_atr_as_dict(pytrf_ATR *self) {
return Py_BuildValue("{s:O,s:n,s:n,s:O,s:i,s:n,s:n,s:i,s:f,s:i,s:i,s:i,s:i,s:i,s:f}", "chrom", self->seqid,
"start", self->start, "end", self->end, "motif", self->motif, "type", self->mlen,
"seed_start", self->sstart, "seed_end", self->send, "seed_repeat", self->srepeat,
"repeat", self->repeat, "length", self->length, "matches", self->matches,
return Py_BuildValue("{s:O,s:n,s:n,s:O,s:i,s:f,s:i,s:n,s:n,s:i,s:i,s:i,s:i,s:i,s:i,s:f}",
"chrom", self->seqid, "start", self->start, "end", self->end,
"motif", self->motif, "type", self->mlen, "repeat", self->repeat,
"length", self->length, "seed_start", self->sstart, "seed_end", self->send,
"seed_repeat", self->srepeat, "seed_length", self->slen ,"matches", self->matches,
"substitutions", self->substitutions, "insertions", self->insertions,
"deletions", self->deletions, "identity", self->identity);
}
Expand Down Expand Up @@ -72,12 +73,13 @@ static PyObject* pytrf_atr_as_string(pytrf_ATR *self, PyObject *args, PyObject *

percent = PyFloat_FromDouble(self->identity);
repeat = PyFloat_FromDouble(self->repeat);
retval = PyUnicode_FromFormat("%S%s%zd%s%zd%s%S%s%d%s%d%s%zd%s%zd%s%S%s%d%s%d%s%d%s%d%s%d%s%S%s",
self->seqid, separator, self->sstart, separator, self->send, separator,
self->motif, separator, self->mlen, separator, self->srepeat, separator,
self->start, separator, self->end, separator, repeat, separator,
self->length, separator, self->matches, separator, self->substitutions, separator,
self->insertions, separator, self->deletions, separator, percent, terminator);
retval = PyUnicode_FromFormat("%S%s%zd%s%zd%s%S%s%d%s%S%s%d%s%zd%s%zd%s%d%s%d%s%d%s%d%s%d%s%d%s%S%s",
self->seqid, separator, self->start, separator, self->end, separator,
self->motif, separator, self->mlen, separator, repeat, separator,
self->length, separator, self->sstart, separator, self->send, separator,
self->srepeat, separator, self->slen, separator, self->matches, separator,
self->substitutions, separator, self->insertions, separator, self->deletions,
separator, percent, terminator);
Py_DECREF(percent);
Py_DECREF(repeat);
return retval;
Expand Down Expand Up @@ -108,7 +110,8 @@ static PyMemberDef pytrf_atr_members[] = {
{"type", T_INT, offsetof(pytrf_ATR, mlen), READONLY},
{"seed_start", T_PYSSIZET, offsetof(pytrf_ATR, sstart), READONLY},
{"seed_end", T_PYSSIZET, offsetof(pytrf_ATR, send), READONLY},
{"seed_repeat", T_PYSSIZET, offsetof(pytrf_ATR, repeat), READONLY},
{"seed_repeat", T_INT, offsetof(pytrf_ATR, repeat), READONLY},
{"seed_length", T_INT, offsetof(pytrf_ATR, slen), READONLY},
{"repeat", T_FLOAT, offsetof(pytrf_ATR, repeat), READONLY},
{"length", T_INT, offsetof(pytrf_ATR, length), READONLY},
{"matches", T_INT, offsetof(pytrf_ATR, matches), READONLY},
Expand Down
3 changes: 3 additions & 0 deletions src/atr.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ typedef struct {
//seed repeats
int srepeat;

//seed length
int slen;

//motif length
int mlen;

Expand Down
8 changes: 5 additions & 3 deletions src/itr.c
Original file line number Diff line number Diff line change
Expand Up @@ -491,6 +491,7 @@ static PyObject* pytrf_itrfinder_next(pytrf_ITRFinder *self) {
atr->sstart = seed_start + 1;
atr->send = seed_end + 1;
atr->srepeat = seed_repeat;
atr->slen = seed_length;
atr->repeat = tandem_length * 1.0 / j;
atr->length = tandem_length;
atr->matches = tandem_match;
Expand Down Expand Up @@ -612,9 +613,10 @@ static PyObject* pytrf_itrfinder_as_list(pytrf_ITRFinder *self) {
tandem_repeat = tandem_length * 1.0 / j;

if (tandem_identity >= self->min_identity) {
tmp = Py_BuildValue("Onnsiinnfiiiiif", self->seqname, seed_start + 1, seed_end + 1, self->motif,
j, seed_repeat, tandem_start, tandem_end, tandem_repeat, tandem_length, tandem_match,
tandem_substitute, tandem_insert, tandem_delete, tandem_identity);
tmp = Py_BuildValue("Onnsifinniiiiiif", self->seqname, tandem_start, tandem_end, self->motif, j,
tandem_repeat, tandem_length, seed_start + 1, seed_end + 1, seed_repeat,
seed_length, tandem_match, tandem_substitute, tandem_insert, tandem_delete,
tandem_identity);

PyList_Append(itrs, tmp);
Py_DECREF(tmp);
Expand Down

0 comments on commit 417e70e

Please sign in to comment.