Cargando…

A statistical score for assessing the quality of multiple sequence alignments

BACKGROUND: Multiple sequence alignment is the foundation of many important applications in bioinformatics that aim at detecting functionally important regions, predicting protein structures, building phylogenetic trees etc. Although the automatic construction of a multiple sequence alignment for a...

Descripción completa

Detalles Bibliográficos
Autores principales: Ahola, Virpi, Aittokallio, Tero, Vihinen, Mauno, Uusipaikka, Esa
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1687212/
https://www.ncbi.nlm.nih.gov/pubmed/17081313
http://dx.doi.org/10.1186/1471-2105-7-484
_version_ 1782131192091377664
author Ahola, Virpi
Aittokallio, Tero
Vihinen, Mauno
Uusipaikka, Esa
author_facet Ahola, Virpi
Aittokallio, Tero
Vihinen, Mauno
Uusipaikka, Esa
author_sort Ahola, Virpi
collection PubMed
description BACKGROUND: Multiple sequence alignment is the foundation of many important applications in bioinformatics that aim at detecting functionally important regions, predicting protein structures, building phylogenetic trees etc. Although the automatic construction of a multiple sequence alignment for a set of remotely related sequences cause a very challenging and error-prone task, many downstream analyses still rely heavily on the accuracy of the alignments. RESULTS: To address the need for an objective evaluation framework, we introduce a statistical score that assesses the quality of a given multiple sequence alignment. The quality assessment is based on counting the number of significantly conserved positions in the alignment using importance sampling method in conjunction with statistical profile analysis framework. We first evaluate a novel objective function used in the alignment quality score for measuring the positional conservation. The results for the Src homology 2 (SH2) domain, Ras-like proteins, peptidase M13, subtilase and β-lactamase families demonstrate that the score can distinguish sequence patterns with different degrees of conservation. Secondly, we evaluate the quality of the alignments produced by several widely used multiple sequence alignment programs using a novel alignment quality score and a commonly used sum of pairs method. According to these results, the Mafft strategy L-INS-i outperforms the other methods, although the difference between the Probcons, TCoffee and Muscle is mostly insignificant. The novel alignment quality score provides similar results than the sum of pairs method. CONCLUSION: The results indicate that the proposed statistical score is useful in assessing the quality of multiple sequence alignments.
format Text
id pubmed-1687212
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-16872122006-12-07 A statistical score for assessing the quality of multiple sequence alignments Ahola, Virpi Aittokallio, Tero Vihinen, Mauno Uusipaikka, Esa BMC Bioinformatics Methodology Article BACKGROUND: Multiple sequence alignment is the foundation of many important applications in bioinformatics that aim at detecting functionally important regions, predicting protein structures, building phylogenetic trees etc. Although the automatic construction of a multiple sequence alignment for a set of remotely related sequences cause a very challenging and error-prone task, many downstream analyses still rely heavily on the accuracy of the alignments. RESULTS: To address the need for an objective evaluation framework, we introduce a statistical score that assesses the quality of a given multiple sequence alignment. The quality assessment is based on counting the number of significantly conserved positions in the alignment using importance sampling method in conjunction with statistical profile analysis framework. We first evaluate a novel objective function used in the alignment quality score for measuring the positional conservation. The results for the Src homology 2 (SH2) domain, Ras-like proteins, peptidase M13, subtilase and β-lactamase families demonstrate that the score can distinguish sequence patterns with different degrees of conservation. Secondly, we evaluate the quality of the alignments produced by several widely used multiple sequence alignment programs using a novel alignment quality score and a commonly used sum of pairs method. According to these results, the Mafft strategy L-INS-i outperforms the other methods, although the difference between the Probcons, TCoffee and Muscle is mostly insignificant. The novel alignment quality score provides similar results than the sum of pairs method. CONCLUSION: The results indicate that the proposed statistical score is useful in assessing the quality of multiple sequence alignments. BioMed Central 2006-11-03 /pmc/articles/PMC1687212/ /pubmed/17081313 http://dx.doi.org/10.1186/1471-2105-7-484 Text en Copyright © 2006 Ahola et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Ahola, Virpi
Aittokallio, Tero
Vihinen, Mauno
Uusipaikka, Esa
A statistical score for assessing the quality of multiple sequence alignments
title A statistical score for assessing the quality of multiple sequence alignments
title_full A statistical score for assessing the quality of multiple sequence alignments
title_fullStr A statistical score for assessing the quality of multiple sequence alignments
title_full_unstemmed A statistical score for assessing the quality of multiple sequence alignments
title_short A statistical score for assessing the quality of multiple sequence alignments
title_sort statistical score for assessing the quality of multiple sequence alignments
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1687212/
https://www.ncbi.nlm.nih.gov/pubmed/17081313
http://dx.doi.org/10.1186/1471-2105-7-484
work_keys_str_mv AT aholavirpi astatisticalscoreforassessingthequalityofmultiplesequencealignments
AT aittokalliotero astatisticalscoreforassessingthequalityofmultiplesequencealignments
AT vihinenmauno astatisticalscoreforassessingthequalityofmultiplesequencealignments
AT uusipaikkaesa astatisticalscoreforassessingthequalityofmultiplesequencealignments
AT aholavirpi statisticalscoreforassessingthequalityofmultiplesequencealignments
AT aittokalliotero statisticalscoreforassessingthequalityofmultiplesequencealignments
AT vihinenmauno statisticalscoreforassessingthequalityofmultiplesequencealignments
AT uusipaikkaesa statisticalscoreforassessingthequalityofmultiplesequencealignments