Cargando…

Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words

Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimiz...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lin, Hsin-Nan, Notredame, Cédric, Chang, Jia-Ming, Sung, Ting-Yi, Hsu, Wen-Lian
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3229492/ https://www.ncbi.nlm.nih.gov/pubmed/22163274 http://dx.doi.org/10.1371/journal.pone.0027872

_version_	1782217948474114048
author	Lin, Hsin-Nan Notredame, Cédric Chang, Jia-Ming Sung, Ting-Yi Hsu, Wen-Lian
author_facet	Lin, Hsin-Nan Notredame, Cédric Chang, Jia-Ming Sung, Ting-Yi Hsu, Wen-Lian
author_sort	Lin, Hsin-Nan
collection	PubMed
description	Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently. In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.
format	Online Article Text
id	pubmed-3229492
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-32294922011-12-12 Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words Lin, Hsin-Nan Notredame, Cédric Chang, Jia-Ming Sung, Ting-Yi Hsu, Wen-Lian PLoS One Research Article Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently. In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/. Public Library of Science 2011-12-02 /pmc/articles/PMC3229492/ /pubmed/22163274 http://dx.doi.org/10.1371/journal.pone.0027872 Text en Lin et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Lin, Hsin-Nan Notredame, Cédric Chang, Jia-Ming Sung, Ting-Yi Hsu, Wen-Lian Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words
title	Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words
title_full	Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words
title_fullStr	Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words
title_full_unstemmed	Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words
title_short	Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words
title_sort	improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3229492/ https://www.ncbi.nlm.nih.gov/pubmed/22163274 http://dx.doi.org/10.1371/journal.pone.0027872
work_keys_str_mv	AT linhsinnan improvingthealignmentqualityofconsistencybasedalignerswithanevaluationfunctionusingsynonymousproteinwords AT notredamecedric improvingthealignmentqualityofconsistencybasedalignerswithanevaluationfunctionusingsynonymousproteinwords AT changjiaming improvingthealignmentqualityofconsistencybasedalignerswithanevaluationfunctionusingsynonymousproteinwords AT sungtingyi improvingthealignmentqualityofconsistencybasedalignerswithanevaluationfunctionusingsynonymousproteinwords AT hsuwenlian improvingthealignmentqualityofconsistencybasedalignerswithanevaluationfunctionusingsynonymousproteinwords

Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words

Ejemplares similares