Cargando…

Genomic multiple sequence alignments: refinement using a genetic algorithm

BACKGROUND: Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic difference...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Chunlin, Lefkowitz, Elliot J
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2005
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1208854/ https://www.ncbi.nlm.nih.gov/pubmed/16086841 http://dx.doi.org/10.1186/1471-2105-6-200

_version_	1782124916354580480
author	Wang, Chunlin Lefkowitz, Elliot J
author_facet	Wang, Chunlin Lefkowitz, Elliot J
author_sort	Wang, Chunlin
collection	PubMed
description	BACKGROUND: Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation) score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. RESULTS: We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned) regions of the orthopoxvirus alignment. Overall sequence identity increased only slightly; but significantly, this occurred at the same time that the overall alignment length decreased – through the removal of gaps – by approximately 200 gapped regions representing roughly 1,300 gaps. CONCLUSION: We have implemented a genetic algorithm in parallel mode to optimize multiple genomic sequence alignments initially generated by various alignment tools. Benchmarking experiments showed that the refinement algorithm improved genomic sequence alignments within a reasonable period of time.
format	Text
id	pubmed-1208854
institution	National Center for Biotechnology Information
language	English
publishDate	2005
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-12088542005-09-15 Genomic multiple sequence alignments: refinement using a genetic algorithm Wang, Chunlin Lefkowitz, Elliot J BMC Bioinformatics Software BACKGROUND: Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation) score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. RESULTS: We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned) regions of the orthopoxvirus alignment. Overall sequence identity increased only slightly; but significantly, this occurred at the same time that the overall alignment length decreased – through the removal of gaps – by approximately 200 gapped regions representing roughly 1,300 gaps. CONCLUSION: We have implemented a genetic algorithm in parallel mode to optimize multiple genomic sequence alignments initially generated by various alignment tools. Benchmarking experiments showed that the refinement algorithm improved genomic sequence alignments within a reasonable period of time. BioMed Central 2005-08-08 /pmc/articles/PMC1208854/ /pubmed/16086841 http://dx.doi.org/10.1186/1471-2105-6-200 Text en Copyright © 2005 Wang and Lefkowitz; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Wang, Chunlin Lefkowitz, Elliot J Genomic multiple sequence alignments: refinement using a genetic algorithm
title	Genomic multiple sequence alignments: refinement using a genetic algorithm
title_full	Genomic multiple sequence alignments: refinement using a genetic algorithm
title_fullStr	Genomic multiple sequence alignments: refinement using a genetic algorithm
title_full_unstemmed	Genomic multiple sequence alignments: refinement using a genetic algorithm
title_short	Genomic multiple sequence alignments: refinement using a genetic algorithm
title_sort	genomic multiple sequence alignments: refinement using a genetic algorithm
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1208854/ https://www.ncbi.nlm.nih.gov/pubmed/16086841 http://dx.doi.org/10.1186/1471-2105-6-200
work_keys_str_mv	AT wangchunlin genomicmultiplesequencealignmentsrefinementusingageneticalgorithm AT lefkowitzelliotj genomicmultiplesequencealignmentsrefinementusingageneticalgorithm

Genomic multiple sequence alignments: refinement using a genetic algorithm

Ejemplares similares