Cargando…

The accuracy of several multiple sequence alignment programs for proteins

BACKGROUND: There have been many algorithms and software programs implemented for the inference of multiple sequence alignments of protein and DNA sequences. The "true" alignment is usually unknown due to the incomplete knowledge of the evolutionary history of the sequences, making it diff...

Descripción completa

Detalles Bibliográficos
Autores principales: Nuin, Paulo AS, Wang, Zhouzhi, Tillier, Elisabeth RM
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1633746/
https://www.ncbi.nlm.nih.gov/pubmed/17062146
http://dx.doi.org/10.1186/1471-2105-7-471
_version_ 1782130638411792384
author Nuin, Paulo AS
Wang, Zhouzhi
Tillier, Elisabeth RM
author_facet Nuin, Paulo AS
Wang, Zhouzhi
Tillier, Elisabeth RM
author_sort Nuin, Paulo AS
collection PubMed
description BACKGROUND: There have been many algorithms and software programs implemented for the inference of multiple sequence alignments of protein and DNA sequences. The "true" alignment is usually unknown due to the incomplete knowledge of the evolutionary history of the sequences, making it difficult to gauge the relative accuracy of the programs. RESULTS: We tested nine of the most often used protein alignment programs and compared their results using sequences generated with the simulation software Simprot which creates known alignments under realistic and controlled evolutionary scenarios. We have simulated more than 30000 alignment sets using various evolutionary histories in order to define strengths and weaknesses of each program tested. We found that alignment accuracy is extremely dependent on the number of insertions and deletions in the sequences, and that indel size has a weaker effect. We also considered benchmark alignments from the latest version of BAliBASE and the results relative to BAliBASE- and Simprot-generated data sets were consistent in most cases. CONCLUSION: Our results indicate that employing Simprot's simulated sequences allows the creation of a more flexible and broader range of alignment classes than the usual methods for alignment accuracy assessment. Simprot also allows for a quick and efficient analysis of a wider range of possible evolutionary histories that might not be present in currently available alignment sets. Among the nine programs tested, the iterative approach available in Mafft (L-INS-i) and ProbCons were consistently the most accurate, with Mafft being the faster of the two.
format Text
id pubmed-1633746
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-16337462006-11-06 The accuracy of several multiple sequence alignment programs for proteins Nuin, Paulo AS Wang, Zhouzhi Tillier, Elisabeth RM BMC Bioinformatics Research Article BACKGROUND: There have been many algorithms and software programs implemented for the inference of multiple sequence alignments of protein and DNA sequences. The "true" alignment is usually unknown due to the incomplete knowledge of the evolutionary history of the sequences, making it difficult to gauge the relative accuracy of the programs. RESULTS: We tested nine of the most often used protein alignment programs and compared their results using sequences generated with the simulation software Simprot which creates known alignments under realistic and controlled evolutionary scenarios. We have simulated more than 30000 alignment sets using various evolutionary histories in order to define strengths and weaknesses of each program tested. We found that alignment accuracy is extremely dependent on the number of insertions and deletions in the sequences, and that indel size has a weaker effect. We also considered benchmark alignments from the latest version of BAliBASE and the results relative to BAliBASE- and Simprot-generated data sets were consistent in most cases. CONCLUSION: Our results indicate that employing Simprot's simulated sequences allows the creation of a more flexible and broader range of alignment classes than the usual methods for alignment accuracy assessment. Simprot also allows for a quick and efficient analysis of a wider range of possible evolutionary histories that might not be present in currently available alignment sets. Among the nine programs tested, the iterative approach available in Mafft (L-INS-i) and ProbCons were consistently the most accurate, with Mafft being the faster of the two. BioMed Central 2006-10-24 /pmc/articles/PMC1633746/ /pubmed/17062146 http://dx.doi.org/10.1186/1471-2105-7-471 Text en Copyright © 2006 Nuin et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Nuin, Paulo AS
Wang, Zhouzhi
Tillier, Elisabeth RM
The accuracy of several multiple sequence alignment programs for proteins
title The accuracy of several multiple sequence alignment programs for proteins
title_full The accuracy of several multiple sequence alignment programs for proteins
title_fullStr The accuracy of several multiple sequence alignment programs for proteins
title_full_unstemmed The accuracy of several multiple sequence alignment programs for proteins
title_short The accuracy of several multiple sequence alignment programs for proteins
title_sort accuracy of several multiple sequence alignment programs for proteins
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1633746/
https://www.ncbi.nlm.nih.gov/pubmed/17062146
http://dx.doi.org/10.1186/1471-2105-7-471
work_keys_str_mv AT nuinpauloas theaccuracyofseveralmultiplesequencealignmentprogramsforproteins
AT wangzhouzhi theaccuracyofseveralmultiplesequencealignmentprogramsforproteins
AT tillierelisabethrm theaccuracyofseveralmultiplesequencealignmentprogramsforproteins
AT nuinpauloas accuracyofseveralmultiplesequencealignmentprogramsforproteins
AT wangzhouzhi accuracyofseveralmultiplesequencealignmentprogramsforproteins
AT tillierelisabethrm accuracyofseveralmultiplesequencealignmentprogramsforproteins