Cargando…

Predicting and improving the protein sequence alignment quality by support vector regression

BACKGROUND: For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Minho, Jeong, Chan-seok, Kim, Dongsup
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2222655/
https://www.ncbi.nlm.nih.gov/pubmed/18053160
http://dx.doi.org/10.1186/1471-2105-8-471
_version_ 1782149365654093824
author Lee, Minho
Jeong, Chan-seok
Kim, Dongsup
author_facet Lee, Minho
Jeong, Chan-seok
Kim, Dongsup
author_sort Lee, Minho
collection PubMed
description BACKGROUND: For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy can vary significantly depending on our choice of various alignment parameters such as gap opening penalty and gap extension penalty. Because the accuracy of sequence alignment is typically measured by comparing it with its corresponding structure alignment, there is no good way of evaluating alignment accuracy without knowing the structure of a query protein, which is obviously not available at the time of structure prediction. Moreover, there is no universal alignment parameter option that would always yield the optimal alignment. RESULTS: In this work, we develop a method to predict the quality of the alignment between a query and a template. We train the support vector regression (SVR) models to predict the MaxSub scores as a measure of alignment quality. The alignment between a query protein and a template of length n is transformed into a (n + 1)-dimensional feature vector, then it is used as an input to predict the alignment quality by the trained SVR model. Performance of our work is evaluated by various measures including Pearson correlation coefficient between the observed and predicted MaxSub scores. Result shows high correlation coefficient of 0.945. For a pair of query and template, 48 alignments are generated by changing alignment options. Trained SVR models are then applied to predict the MaxSub scores of those and to select the best alignment option which is chosen specifically to the query-template pair. This adaptive selection procedure results in 7.4% improvement of MaxSub scores, compared to those when the single best parameter option is used for all query-template pairs. CONCLUSION: The present work demonstrates that the alignment quality can be predicted with reasonable accuracy. Our method is useful not only for selecting the optimal alignment parameters for a chosen template based on predicted alignment quality, but also for filtering out problematic templates that are not suitable for structure prediction due to poor alignment accuracy. This is implemented as a part in FORECAST, the server for fold-recognition and is freely available on the web at
format Text
id pubmed-2222655
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22226552008-02-02 Predicting and improving the protein sequence alignment quality by support vector regression Lee, Minho Jeong, Chan-seok Kim, Dongsup BMC Bioinformatics Research Article BACKGROUND: For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy can vary significantly depending on our choice of various alignment parameters such as gap opening penalty and gap extension penalty. Because the accuracy of sequence alignment is typically measured by comparing it with its corresponding structure alignment, there is no good way of evaluating alignment accuracy without knowing the structure of a query protein, which is obviously not available at the time of structure prediction. Moreover, there is no universal alignment parameter option that would always yield the optimal alignment. RESULTS: In this work, we develop a method to predict the quality of the alignment between a query and a template. We train the support vector regression (SVR) models to predict the MaxSub scores as a measure of alignment quality. The alignment between a query protein and a template of length n is transformed into a (n + 1)-dimensional feature vector, then it is used as an input to predict the alignment quality by the trained SVR model. Performance of our work is evaluated by various measures including Pearson correlation coefficient between the observed and predicted MaxSub scores. Result shows high correlation coefficient of 0.945. For a pair of query and template, 48 alignments are generated by changing alignment options. Trained SVR models are then applied to predict the MaxSub scores of those and to select the best alignment option which is chosen specifically to the query-template pair. This adaptive selection procedure results in 7.4% improvement of MaxSub scores, compared to those when the single best parameter option is used for all query-template pairs. CONCLUSION: The present work demonstrates that the alignment quality can be predicted with reasonable accuracy. Our method is useful not only for selecting the optimal alignment parameters for a chosen template based on predicted alignment quality, but also for filtering out problematic templates that are not suitable for structure prediction due to poor alignment accuracy. This is implemented as a part in FORECAST, the server for fold-recognition and is freely available on the web at BioMed Central 2007-12-03 /pmc/articles/PMC2222655/ /pubmed/18053160 http://dx.doi.org/10.1186/1471-2105-8-471 Text en Copyright © 2007 Lee et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Lee, Minho
Jeong, Chan-seok
Kim, Dongsup
Predicting and improving the protein sequence alignment quality by support vector regression
title Predicting and improving the protein sequence alignment quality by support vector regression
title_full Predicting and improving the protein sequence alignment quality by support vector regression
title_fullStr Predicting and improving the protein sequence alignment quality by support vector regression
title_full_unstemmed Predicting and improving the protein sequence alignment quality by support vector regression
title_short Predicting and improving the protein sequence alignment quality by support vector regression
title_sort predicting and improving the protein sequence alignment quality by support vector regression
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2222655/
https://www.ncbi.nlm.nih.gov/pubmed/18053160
http://dx.doi.org/10.1186/1471-2105-8-471
work_keys_str_mv AT leeminho predictingandimprovingtheproteinsequencealignmentqualitybysupportvectorregression
AT jeongchanseok predictingandimprovingtheproteinsequencealignmentqualitybysupportvectorregression
AT kimdongsup predictingandimprovingtheproteinsequencealignmentqualitybysupportvectorregression