Monday, October 3, 2022


Biotechnology News Magazine

Top 5 DNA Sequencing Analysis Tools Used in the Biotech Industry

Latest Posts

City of Hope to Accelerate Immunotherapy Research & Treatment Innovation with $15 Mil Gift from Ted Schwartz Family

Ted Schwartz, who is now cancer free, achieved complete remission at City of Hope in 2020 with the center's leading CAR T cell therapy after a 16-year battle with lymphoma, provided the gift to City of Hope to advance treatment options that offer better outcomes and quality of life for people living with cancer.

Neurocrine Biosciences Appoints Dr Ingrid Delaet as Chief Regulatory Officer

Prior to joining Neurocrine Biosciences, Dr Ingrid Delaet served as Senior Vice President, Regulatory Affairs at Intercept Pharmaceuticals, which she joined in 2016.

Astrea Bioseparations Introduces Nereus LentiHERO, a Fit-for-purpose Solution for Lentiviral Vector Purification

“We believe that AstreAdept will be a game-changer,” explained Astrea Bioseparations’ CEO Terry Pizzie. “Our approach was to rapidly develop and incorporate this material into the Nereus LentiHERO, a simple, fit-for-purpose device that radically transforms how lentivirus can be purified [in terms of speed, recovery, and efficiency].

At Pack Expo, Schreiner MediPharm to Debut Functional Labels Designed from More Sustainable Materials

Schreiner MediPharm advises he new label concepts are based on existing items in Schreiner MediPharm’s roster of functional labeling solutions.

Top 5 DNA Sequencing Analysis Tools Used in the Biotech Industry authored by By: Dr. Saptarshi Sinha*

In the era of informatics, one can not ignore the information hidden inside a stretch of biomolecules called nucleotides. But the tricky part is the proper understanding of the simple sequences carrying intricate pieces of information. Analysis of sequences first came into the picture during the 70s, when Christian D. Wunsch and Saul B. Needleman came with their first algorithm based on amino acid sequence alignment (Needleman et al. 1970). But the big picture started from the beginning of the human genome project (Olson 1993).

Followed by Wunsch and Needleman, computer scientists worldwide slowly started working on sequence-based problems that lie to biologist until date. Today, analysis of nucleotide sequences are required at every step of research. From primer design for polynucleotide amplification, recognizing genes in a genome for functional study, to sequence similarity analysis for comparative genomics, sequences analysis is crucial for every domain of life, including viruses. Here we will discuss the top five DNA sequence analysis tools that revolutionized biology research.

One of the leading problems of genome analysis is recognizing genes along the stretch of a nucleotide sequence. In prokaryotes, the approach towards this problem is much easier as they mostly have coding regions. But in eukaryotes, it will be more challenging as they posses both intron and exons. There are several different algorithms present to encounter the problem. Broadly their approaches are based on either statistical parameters of DNA sequences or homology-based methods.

Genemark: It was developed in 1993 at Georgia Institute of Technology by Mark Borodovsky and James McInincg (Borodovsky et al. 1993). It is the first algorithm that includes a non-homogeneous Markov model to identify protein-coding and noncoding regions in the prokaryotic genome. With the help of this tool, we can classify prokaryotic genes into typical, highly typical and atypical depending on the multivariate codon analysis, where the atypical corresponds to horizontally transferred genes (Médigue et al. 1991 ).

Genemark uses gene recognition parameter based on the target organism. It is a open source application and also involved in NCBI pipeline for prokaryotes.

Glimmer: Like Genemark, Gene Locator and Interpolated Markov Modeler (Glimmer) is another tool that reforms prokaryotic functional genomics. This algorithm was developed at Johns Hopkins University by Steven Salzberg (Salzberg at al. 1998). It is also based on Markov models. Here the order of the model increases at each step with the separate estimations of predictive power. Later, it was also modified for eukaryotic genomes. Glimmer is used at TIGR as a primary gene finder tool.

A developed variant of Glimmer is specialised in small eukaryotic genomes like genome of Plasmodium. It is mainly an open source program and used by National institute of health for medical research.

Grail: On the other hand, eukaryotic genome analysis was first revolutionised by Gene Recognition and Assembly Internet Link (Grail), which was developed at Oak Ridge National Laboratory by Ed Uberbacher in 1996 (Uberbacher et al. 1996). For the first time, with the homology-based method, their algorithm can identify CpG islands, polyA sites, promoters, exons, and frameshift mutations by comparing them with the human and mouse genome.

It is incorporated in the Oak Ridge genome analysis pipeline, which reforms the eukaryotic genome analysis. This pipeline also offers comparing GrailEXP (a modified version) results with Genscane (discussed below) for better understanding.

Genscan: The complexity associated with a sizeable eukaryotic genome demands a more accurate and time-efficient algorithm that also takes care of a gene’s statistical and structural properties. This need was fulfilled by Genscan, developed at Stanford University by Chris Burge and Samuel Karlin (Burge et al. 1997). Based on a complex probabilistic model, this algorithm takes care of the gene structure and precisely its biological role in translational, transcription and splicing events.

Genscan can provide comparative gene models based on GC content and has been used as the primary gene prediction tool in the international human genome project.

Genebuilder: On the other extreme, this algorithm was developed by Milanesi et al., which is based on the ab-initio open source gene prediction method (Milanesi et al. 1999). This algorithm considers various parameters like CpG islands, splicing site data, GC content, repetitive elements etc., for the identification of a gene. Moreover, based on relative frequencies of synonymous and non-synonymous substitutions, this algorithm identifies the coding sequences.

Genebuilder enables us to recognise gene structure based on protein sequences. Also, it helps users to predict the gene structure in an interactive manner by using various parameters.

These algorithms are continuously evolving and lead to the development of new and more efficient techniques. Nowadays, there are many tools available that could perform sequence analysis more efficiently and accurately. The sequence analysis also motivates computer scientists to evolve a new branch called DNA computing, an alternative to traditional electronic computation (Paun et al. 2005). Bioinformatics freelancers can help with analyzing complex data and interpreting results.

*Dr. Saptarshi Sinha is an experienced systems biologist on Kolabtree. He has substantial interdisciplinary research experience in microbiology as well as mathematical/computational biology. His research areas include biological network analysis, evolutionary game theory, bioinformatics, Monte Carlo simulation and basic mathematical modelling. He also has expertise in bacterial population dynamics, phage-bacteria interactions, molecular cloning, polymerase chain reaction, fluorescence-activated cell sorting and scanning electron microscopy.


  • Needleman, S. B., & Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology48(3), 443-453.
  • Olson, M. V. (1993). The human genome project. Proceedings of the National Academy of Sciences90(10), 4338-4344.
  • Borodovsky, M., & McIninch, J. (1993). GENMARK: parallel gene recognition for both DNA strands. Computers & chemistry17(2), 123-133.
  • Médigue, C., Rouxel, T., Vigier, P., Hénaut, A., & Danchin, A. (1991). Evidence for horizontal gene transfer in Escherichia coli speciation.Journal of molecular biology222(4), 851-856.
  • Salzberg, S. L., Delcher, A. L., Kasif, S., & White, O. (1998). Microbial gene identification using interpolated Markov models. Nucleic acids research26(2), 544-548.
  • Uberbacher, E. C., Xu, Y., & Mural, R. J. (1996). [16] Discovering and understanding genes in human DNA sequence using GRAIL. Methods in enzymology266, 259-281.
  • Burge, C., & Karlin, S. (1997). Prediction of complete gene structures in human genomic DNA. Journal of molecular biology268(1), 78-94.
  • Milanesi, L., D’Angelo, D., & Rogozin, I. B. (1999). GeneBuilder: interactive in silico prediction of gene structure.Bioinformatics (Oxford, England)15(7), 612-621.
  • Paun, G., Rozenberg, G., & Salomaa, A. (2005). DNA computing: new computing paradigms. Springer Science & Business Media.

Latest Posts

Learn More




Our Sister Publication

Medical Device News Magazine