Cargando…

A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives

Multiple comparison or alignmentof protein sequences has become a fundamental tool in many different domains in modern molecular biology, from evolutionary studies to prediction of 2D/3D structure, molecular function and inter-molecular interactions etc. By placing the sequence in the framework of t...

Descripción completa

Detalles Bibliográficos
Autores principales: Thompson, Julie D., Linard, Benjamin, Lecompte, Odile, Poch, Olivier
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3069049/
https://www.ncbi.nlm.nih.gov/pubmed/21483869
http://dx.doi.org/10.1371/journal.pone.0018093
_version_ 1782201312099696640
author Thompson, Julie D.
Linard, Benjamin
Lecompte, Odile
Poch, Olivier
author_facet Thompson, Julie D.
Linard, Benjamin
Lecompte, Odile
Poch, Olivier
author_sort Thompson, Julie D.
collection PubMed
description Multiple comparison or alignmentof protein sequences has become a fundamental tool in many different domains in modern molecular biology, from evolutionary studies to prediction of 2D/3D structure, molecular function and inter-molecular interactions etc. By placing the sequence in the framework of the overall family, multiple alignments can be used to identify conserved features and to highlight differences or specificities. In this paper, we describe a comprehensive evaluation of many of the most popular methods for multiple sequence alignment (MSA), based on a new benchmark test set. The benchmark is designed to represent typical problems encountered when aligning the large protein sequence sets that result from today's high throughput biotechnologies. We show that alignmentmethods have significantly progressed and can now identify most of the shared sequence features that determine the broad molecular function(s) of a protein family, even for divergent sequences. However,we have identified a number of important challenges. First, the locally conserved regions, that reflect functional specificities or that modulate a protein's function in a given cellular context,are less well aligned. Second, motifs in natively disordered regions are often misaligned. Third, the badly predicted or fragmentary protein sequences, which make up a large proportion of today's databases, lead to a significant number of alignment errors. Based on this study, we demonstrate that the existing MSA methods can be exploited in combination to improve alignment accuracy, although novel approaches will still be needed to fully explore the most difficult regions. We then propose knowledge-enabled, dynamic solutions that will hopefully pave the way to enhanced alignment construction and exploitation in future evolutionary systems biology studies.
format Text
id pubmed-3069049
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-30690492011-04-11 A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives Thompson, Julie D. Linard, Benjamin Lecompte, Odile Poch, Olivier PLoS One Research Article Multiple comparison or alignmentof protein sequences has become a fundamental tool in many different domains in modern molecular biology, from evolutionary studies to prediction of 2D/3D structure, molecular function and inter-molecular interactions etc. By placing the sequence in the framework of the overall family, multiple alignments can be used to identify conserved features and to highlight differences or specificities. In this paper, we describe a comprehensive evaluation of many of the most popular methods for multiple sequence alignment (MSA), based on a new benchmark test set. The benchmark is designed to represent typical problems encountered when aligning the large protein sequence sets that result from today's high throughput biotechnologies. We show that alignmentmethods have significantly progressed and can now identify most of the shared sequence features that determine the broad molecular function(s) of a protein family, even for divergent sequences. However,we have identified a number of important challenges. First, the locally conserved regions, that reflect functional specificities or that modulate a protein's function in a given cellular context,are less well aligned. Second, motifs in natively disordered regions are often misaligned. Third, the badly predicted or fragmentary protein sequences, which make up a large proportion of today's databases, lead to a significant number of alignment errors. Based on this study, we demonstrate that the existing MSA methods can be exploited in combination to improve alignment accuracy, although novel approaches will still be needed to fully explore the most difficult regions. We then propose knowledge-enabled, dynamic solutions that will hopefully pave the way to enhanced alignment construction and exploitation in future evolutionary systems biology studies. Public Library of Science 2011-03-31 /pmc/articles/PMC3069049/ /pubmed/21483869 http://dx.doi.org/10.1371/journal.pone.0018093 Text en Thompson et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Thompson, Julie D.
Linard, Benjamin
Lecompte, Odile
Poch, Olivier
A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives
title A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives
title_full A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives
title_fullStr A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives
title_full_unstemmed A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives
title_short A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives
title_sort comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3069049/
https://www.ncbi.nlm.nih.gov/pubmed/21483869
http://dx.doi.org/10.1371/journal.pone.0018093
work_keys_str_mv AT thompsonjulied acomprehensivebenchmarkstudyofmultiplesequencealignmentmethodscurrentchallengesandfutureperspectives
AT linardbenjamin acomprehensivebenchmarkstudyofmultiplesequencealignmentmethodscurrentchallengesandfutureperspectives
AT lecompteodile acomprehensivebenchmarkstudyofmultiplesequencealignmentmethodscurrentchallengesandfutureperspectives
AT pocholivier acomprehensivebenchmarkstudyofmultiplesequencealignmentmethodscurrentchallengesandfutureperspectives
AT thompsonjulied comprehensivebenchmarkstudyofmultiplesequencealignmentmethodscurrentchallengesandfutureperspectives
AT linardbenjamin comprehensivebenchmarkstudyofmultiplesequencealignmentmethodscurrentchallengesandfutureperspectives
AT lecompteodile comprehensivebenchmarkstudyofmultiplesequencealignmentmethodscurrentchallengesandfutureperspectives
AT pocholivier comprehensivebenchmarkstudyofmultiplesequencealignmentmethodscurrentchallengesandfutureperspectives