Skip to content

Latest commit

 

History

History
79 lines (55 loc) · 4.73 KB

REFERENCES.md

File metadata and controls

79 lines (55 loc) · 4.73 KB

Scientific References

Core Algorithms and Methods

Sequence Alignment

  1. Needleman, S.B. and Wunsch, C.D. (1970). "A general method applicable to the search for similarities in the amino acid sequence of two proteins." Journal of Molecular Biology, 48(3), 443-453.

    • Original paper describing the Needleman-Wunsch algorithm for global sequence alignment
  2. Smith, T.F. and Waterman, M.S. (1981). "Identification of common molecular subsequences." Journal of Molecular Biology, 147(1), 195-197.

    • Original paper describing the Smith-Waterman algorithm for local sequence alignment

Motif Finding

  1. Bailey, T.L. and Elkan, C. (1994). "Fitting a mixture model by expectation maximization to discover motifs in biopolymers." Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, 28-36.

    • Reference for probabilistic motif discovery methods
  2. Lawrence, C.E. et al. (1993). "Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment." Science, 262(5131), 208-214.

    • Foundational paper on statistical approaches to motif finding

Regulatory Elements

  1. Bucher, P. (1990). "Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences." Journal of Molecular Biology, 212(4), 563-578.

    • Reference for promoter element consensus sequences
  2. Mathelier, A. et al. (2016). "JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles." Nucleic Acids Research, 44(D1), D110-D115.

    • Source for regulatory element patterns and matrices

DNA Sequence Analysis

  1. Cornish-Bowden, A. (1985). "Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984." Nucleic Acids Research, 13(9), 3021-3030.

    • IUPAC ambiguous base notation standard
  2. Rice, P., Longden, I., and Bleasby, A. (2000). "EMBOSS: The European Molecular Biology Open Software Suite." Trends in Genetics, 16(6), 276-277.

    • Reference for standard bioinformatics algorithms

Tools and Libraries

BioPython

  1. Cock, P.J.A. et al. (2009). "Biopython: freely available Python tools for computational molecular biology and bioinformatics." Bioinformatics, 25(11), 1422-1423.
    • Main BioPython library reference

File Formats

  1. Pearson, W.R. and Lipman, D.J. (1988). "Improved tools for biological sequence comparison." Proceedings of the National Academy of Sciences, 85(8), 2444-2448.
  • FASTA format reference
  1. Cock, P.J.A. et al. (2010). "The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants." Nucleic Acids Research, 38(6), 1767-1771.
  • FASTQ format specifications
  1. Benson, D.A. et al. (2013). "GenBank." Nucleic Acids Research, 41(Database issue), D36-D42.
  • GenBank format reference

Molecular Biology Concepts

DNA Structure and Properties

  1. Watson, J.D. and Crick, F.H.C. (1953). "Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid." Nature, 171(4356), 737-738.
  • Foundational paper on DNA structure
  1. Chargaff, E. (1950). "Chemical specificity of nucleic acids and mechanism of their enzymatic degradation." Experientia, 6(6), 201-209.
  • Base composition rules

Regulatory Elements and Gene Expression

  1. Goldberg, M.L. (1979). "Sequence Analysis of Drosophila Histone Genes." Stanford University Ph.D. Dissertation.
  • Early TATA box characterization
  1. Grosschedl, R. and Birnstiel, M.L. (1980). "Identification of regulatory sequences in the prelude sequences of an H2A histone gene by the study of specific deletion mutants in vivo." Proceedings of the National Academy of Sciences, 77(3), 1432-1436.
  • CAAT box identification and characterization

Review Articles

  1. Durbin, R., Eddy, S.R., Krogh, A., and Mitchison, G. (1998). "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids." Cambridge University Press.
  • Comprehensive review of sequence analysis methods
  1. D'haeseleer, P. (2006). "What are DNA sequence motifs?" Nature Biotechnology, 24(4), 423-425.
  • Review of motif finding concepts and approaches

Additional Resources

  1. Mount, D.W. (2004). "Bioinformatics: Sequence and Genome Analysis." Cold Spring Harbor Laboratory Press.
  • Comprehensive textbook on sequence analysis
  1. Gusfield, D. (1997). "Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology." Cambridge University Press.
  • Algorithmic foundations of sequence analysis

Note: These references represent the foundational work upon which this tool is built. For the most current research and methods in bioinformatics, users are encouraged to consult recent literature and reviews in their specific area of interest.