SomTD

Retrotransposons contribute approximately 40 % of the human genome and subfamilies of ALU, LINE1 and SVA elements remain actively mobile. Nonetheless, the detection of transposable element (TE) insertions poses a significant challenge due to chimera artifacts. Additionally, evaluating the insertion rate with bulk or single-cell sequencing data presents certain challenges.
SomTD is a tool designed for detecting TE insertions and evaluating the insertion rate utilizing both traditional, rule-based algorithms and a convolutional neural network (CNN) model. It is applicable for both bulk and single-cell sequencing data. Notable features of SomTD are as follows:

SomTD prioritizes split read pairs, subsequently extracting discordant read pairs as required. The supplementary alignments of the split read pairs are utilized for pinpointing the location because TE part of the clipped read may be identified as the primary alignment, potentially resulting in a misleading insertion location around a reference insertion (fig1).
A lightweight CNN, is applied to extract every suitable read pair, contrasting previous machine/deep learning applications focusing on the insertion level for insertion detection or genotyping (fig2). This approach allows for the detection of weak signals of rare insertions by minimizing data loss and distinguishing chimera artifacts via features difficult to discern.
SomTD estimates the insertion rate based on the cumulative sum of variant allele fraction including rare insertions in bulk sequencing data based on its elaborated sensitivity and accuracy, which remains comparable across bulk and single-cell sequencing data.

Dependencies

bedtools
bwa
cutadapt
pysam
pytorch
samblaster
samtools

Run SomTD

SomTD
Usage: SomTD.py [options]

Options:
  -h, --help            show this help message and exit
  --input1=INPUT1       input file, bam/sam/fastq, please use .bam .sam
                        .fq/.fastq as a suffix, mandatory
  --input2=INPUT2       input file, if fastq, please use .fq/.fastq as a
                        suffix, optional
  -o OUTPUTPATH, --outputPath=OUTPUTPATH
                        output path, directory name will be output name,
                        directory will be generated autonomously if not exist,
                        mandatory
  -c CUTOFF, --cutoff=CUTOFF
                        minimum soft-clipped length, limit itself is included,
                        optional, default: 10
  -f FRAGLEN, --frag=FRAGLEN
                        expected fragment length, mandatory
  --std=FRAGSTD         standard deviation of fragment length, mandatory
  -r READLEN, --readLen=READLEN
                        read length, mandatory
  -g GREFERENCE, --Greference=GREFERENCE
                        genome reference sequence, fa, mandatory
  -t TREFERENCE, --Treference=TREFERENCE
                        transposon reference sequence, fa, mandatory
  --TreferenceRecom=TREFERENCERECOM
                        reversed complemented transposon reference sequence,
                        fa, mandatory
  -G GINDEX, --Gindex=GINDEX
                        genome reference sequence bwa index, mandatory
  -T TINDEX, --Tindex=TINDEX
                        transposon reference sequence bwa index, mandatory
  -p PARALLEL, --parallel=PARALLEL
                        number of threads, optional, default: 2
  -m MEMORY, --memory=MEMORY
                        memory per thread used, optional, defalut: 2G

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
src		src
LICENSE		LICENSE
README.md		README.md
SomTD_fig1.png		SomTD_fig1.png
SomTD_fig2.png		SomTD_fig2.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SomTD

Dependencies

Run SomTD

About

Releases

Packages

Languages

License

Zhanglab-IOZ/SomTD

Folders and files

Latest commit

History

Repository files navigation

SomTD

Dependencies

Run SomTD

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages