Cargando…

Fast and sensitive multiple alignment of large genomic sequences

BACKGROUND: Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory elements. With a growing number of partially or completely sequenced genomes, multiple alignment is playing an inc...

Descripción completa

Detalles Bibliográficos
Autores principales: Brudno, Michael, Chapman, Michael, Göttgens, Berthold, Batzoglou, Serafim, Morgenstern, Burkhard
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2003
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC521198/
https://www.ncbi.nlm.nih.gov/pubmed/14693042
http://dx.doi.org/10.1186/1471-2105-4-66
_version_ 1782121831775338496
author Brudno, Michael
Chapman, Michael
Göttgens, Berthold
Batzoglou, Serafim
Morgenstern, Burkhard
author_facet Brudno, Michael
Chapman, Michael
Göttgens, Berthold
Batzoglou, Serafim
Morgenstern, Burkhard
author_sort Brudno, Michael
collection PubMed
description BACKGROUND: Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory elements. With a growing number of partially or completely sequenced genomes, multiple alignment is playing an increasingly important role in these studies. In recent years, various tools for pair-wise and multiple genomic alignment have been proposed. Some of them are extremely fast, but often efficiency is achieved at the expense of sensitivity. One way of combining speed and sensitivity is to use an anchored-alignment approach. In a first step, a fast search program identifies a chain of strong local sequence similarities. In a second step, regions between these anchor points are aligned using a slower but more accurate method. RESULTS: Herein, we present CHAOS, a novel algorithm for rapid identification of chains of local pair-wise sequence similarities. Local alignments calculated by CHAOS are used as anchor points to improve the running time of DIALIGN, a slow but sensitive multiple-alignment tool. We show that this way, the running time of DIALIGN can be reduced by more than 95% for BAC-sized and longer sequences, without affecting the quality of the resulting alignments. We apply our approach to a set of five genomic sequences around the stem-cell-leukemia (SCL) gene and demonstrate that exons and small regulatory elements can be identified by our multiple-alignment procedure. CONCLUSION: We conclude that the novel CHAOS local alignment tool is an effective way to significantly speed up global alignment tools such as DIALIGN without reducing the alignment quality. We likewise demonstrate that the DIALIGN/CHAOS combination is able to accurately align short regulatory sequences in distant orthologues.
format Text
id pubmed-521198
institution National Center for Biotechnology Information
language English
publishDate 2003
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5211982004-10-04 Fast and sensitive multiple alignment of large genomic sequences Brudno, Michael Chapman, Michael Göttgens, Berthold Batzoglou, Serafim Morgenstern, Burkhard BMC Bioinformatics Methodology Article BACKGROUND: Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory elements. With a growing number of partially or completely sequenced genomes, multiple alignment is playing an increasingly important role in these studies. In recent years, various tools for pair-wise and multiple genomic alignment have been proposed. Some of them are extremely fast, but often efficiency is achieved at the expense of sensitivity. One way of combining speed and sensitivity is to use an anchored-alignment approach. In a first step, a fast search program identifies a chain of strong local sequence similarities. In a second step, regions between these anchor points are aligned using a slower but more accurate method. RESULTS: Herein, we present CHAOS, a novel algorithm for rapid identification of chains of local pair-wise sequence similarities. Local alignments calculated by CHAOS are used as anchor points to improve the running time of DIALIGN, a slow but sensitive multiple-alignment tool. We show that this way, the running time of DIALIGN can be reduced by more than 95% for BAC-sized and longer sequences, without affecting the quality of the resulting alignments. We apply our approach to a set of five genomic sequences around the stem-cell-leukemia (SCL) gene and demonstrate that exons and small regulatory elements can be identified by our multiple-alignment procedure. CONCLUSION: We conclude that the novel CHAOS local alignment tool is an effective way to significantly speed up global alignment tools such as DIALIGN without reducing the alignment quality. We likewise demonstrate that the DIALIGN/CHAOS combination is able to accurately align short regulatory sequences in distant orthologues. BioMed Central 2003-12-23 /pmc/articles/PMC521198/ /pubmed/14693042 http://dx.doi.org/10.1186/1471-2105-4-66 Text en Copyright © 2003 Brudno et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Methodology Article
Brudno, Michael
Chapman, Michael
Göttgens, Berthold
Batzoglou, Serafim
Morgenstern, Burkhard
Fast and sensitive multiple alignment of large genomic sequences
title Fast and sensitive multiple alignment of large genomic sequences
title_full Fast and sensitive multiple alignment of large genomic sequences
title_fullStr Fast and sensitive multiple alignment of large genomic sequences
title_full_unstemmed Fast and sensitive multiple alignment of large genomic sequences
title_short Fast and sensitive multiple alignment of large genomic sequences
title_sort fast and sensitive multiple alignment of large genomic sequences
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC521198/
https://www.ncbi.nlm.nih.gov/pubmed/14693042
http://dx.doi.org/10.1186/1471-2105-4-66
work_keys_str_mv AT brudnomichael fastandsensitivemultiplealignmentoflargegenomicsequences
AT chapmanmichael fastandsensitivemultiplealignmentoflargegenomicsequences
AT gottgensberthold fastandsensitivemultiplealignmentoflargegenomicsequences
AT batzoglouserafim fastandsensitivemultiplealignmentoflargegenomicsequences
AT morgensternburkhard fastandsensitivemultiplealignmentoflargegenomicsequences