-
Needleman, S.B. and Wunsch, C.D. (1970). "A general method applicable to the search for similarities in the amino acid sequence of two proteins." Journal of Molecular Biology, 48(3), 443-453.
- Original paper describing the Needleman-Wunsch algorithm for global sequence alignment
-
Smith, T.F. and Waterman, M.S. (1981). "Identification of common molecular subsequences." Journal of Molecular Biology, 147(1), 195-197.
- Original paper describing the Smith-Waterman algorithm for local sequence alignment
-
Bailey, T.L. and Elkan, C. (1994). "Fitting a mixture model by expectation maximization to discover motifs in biopolymers." Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, 28-36.
- Reference for probabilistic motif discovery methods
-
Lawrence, C.E. et al. (1993). "Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment." Science, 262(5131), 208-214.
- Foundational paper on statistical approaches to motif finding
-
Bucher, P. (1990). "Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences." Journal of Molecular Biology, 212(4), 563-578.
- Reference for promoter element consensus sequences
-
Mathelier, A. et al. (2016). "JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles." Nucleic Acids Research, 44(D1), D110-D115.
- Source for regulatory element patterns and matrices
-
Cornish-Bowden, A. (1985). "Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984." Nucleic Acids Research, 13(9), 3021-3030.
- IUPAC ambiguous base notation standard
-
Rice, P., Longden, I., and Bleasby, A. (2000). "EMBOSS: The European Molecular Biology Open Software Suite." Trends in Genetics, 16(6), 276-277.
- Reference for standard bioinformatics algorithms
- Cock, P.J.A. et al. (2009). "Biopython: freely available Python tools for computational molecular biology and bioinformatics." Bioinformatics, 25(11), 1422-1423.
- Main BioPython library reference
- Pearson, W.R. and Lipman, D.J. (1988). "Improved tools for biological sequence comparison." Proceedings of the National Academy of Sciences, 85(8), 2444-2448.
- FASTA format reference
- Cock, P.J.A. et al. (2010). "The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants." Nucleic Acids Research, 38(6), 1767-1771.
- FASTQ format specifications
- Benson, D.A. et al. (2013). "GenBank." Nucleic Acids Research, 41(Database issue), D36-D42.
- GenBank format reference
- Watson, J.D. and Crick, F.H.C. (1953). "Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid." Nature, 171(4356), 737-738.
- Foundational paper on DNA structure
- Chargaff, E. (1950). "Chemical specificity of nucleic acids and mechanism of their enzymatic degradation." Experientia, 6(6), 201-209.
- Base composition rules
- Goldberg, M.L. (1979). "Sequence Analysis of Drosophila Histone Genes." Stanford University Ph.D. Dissertation.
- Early TATA box characterization
- Grosschedl, R. and Birnstiel, M.L. (1980). "Identification of regulatory sequences in the prelude sequences of an H2A histone gene by the study of specific deletion mutants in vivo." Proceedings of the National Academy of Sciences, 77(3), 1432-1436.
- CAAT box identification and characterization
- Durbin, R., Eddy, S.R., Krogh, A., and Mitchison, G. (1998). "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids." Cambridge University Press.
- Comprehensive review of sequence analysis methods
- D'haeseleer, P. (2006). "What are DNA sequence motifs?" Nature Biotechnology, 24(4), 423-425.
- Review of motif finding concepts and approaches
- Mount, D.W. (2004). "Bioinformatics: Sequence and Genome Analysis." Cold Spring Harbor Laboratory Press.
- Comprehensive textbook on sequence analysis
- Gusfield, D. (1997). "Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology." Cambridge University Press.
- Algorithmic foundations of sequence analysis
Note: These references represent the foundational work upon which this tool is built. For the most current research and methods in bioinformatics, users are encouraged to consult recent literature and reviews in their specific area of interest.