Cargando…

Genomic multiple sequence alignments: refinement using a genetic algorithm

BACKGROUND: Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic difference...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Chunlin, Lefkowitz, Elliot J
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1208854/
https://www.ncbi.nlm.nih.gov/pubmed/16086841
http://dx.doi.org/10.1186/1471-2105-6-200
_version_ 1782124916354580480
author Wang, Chunlin
Lefkowitz, Elliot J
author_facet Wang, Chunlin
Lefkowitz, Elliot J
author_sort Wang, Chunlin
collection PubMed
description BACKGROUND: Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation) score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. RESULTS: We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned) regions of the orthopoxvirus alignment. Overall sequence identity increased only slightly; but significantly, this occurred at the same time that the overall alignment length decreased – through the removal of gaps – by approximately 200 gapped regions representing roughly 1,300 gaps. CONCLUSION: We have implemented a genetic algorithm in parallel mode to optimize multiple genomic sequence alignments initially generated by various alignment tools. Benchmarking experiments showed that the refinement algorithm improved genomic sequence alignments within a reasonable period of time.
format Text
id pubmed-1208854
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-12088542005-09-15 Genomic multiple sequence alignments: refinement using a genetic algorithm Wang, Chunlin Lefkowitz, Elliot J BMC Bioinformatics Software BACKGROUND: Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation) score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. RESULTS: We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned) regions of the orthopoxvirus alignment. Overall sequence identity increased only slightly; but significantly, this occurred at the same time that the overall alignment length decreased – through the removal of gaps – by approximately 200 gapped regions representing roughly 1,300 gaps. CONCLUSION: We have implemented a genetic algorithm in parallel mode to optimize multiple genomic sequence alignments initially generated by various alignment tools. Benchmarking experiments showed that the refinement algorithm improved genomic sequence alignments within a reasonable period of time. BioMed Central 2005-08-08 /pmc/articles/PMC1208854/ /pubmed/16086841 http://dx.doi.org/10.1186/1471-2105-6-200 Text en Copyright © 2005 Wang and Lefkowitz; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Wang, Chunlin
Lefkowitz, Elliot J
Genomic multiple sequence alignments: refinement using a genetic algorithm
title Genomic multiple sequence alignments: refinement using a genetic algorithm
title_full Genomic multiple sequence alignments: refinement using a genetic algorithm
title_fullStr Genomic multiple sequence alignments: refinement using a genetic algorithm
title_full_unstemmed Genomic multiple sequence alignments: refinement using a genetic algorithm
title_short Genomic multiple sequence alignments: refinement using a genetic algorithm
title_sort genomic multiple sequence alignments: refinement using a genetic algorithm
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1208854/
https://www.ncbi.nlm.nih.gov/pubmed/16086841
http://dx.doi.org/10.1186/1471-2105-6-200
work_keys_str_mv AT wangchunlin genomicmultiplesequencealignmentsrefinementusingageneticalgorithm
AT lefkowitzelliotj genomicmultiplesequencealignmentsrefinementusingageneticalgorithm