Cargando…

Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties

A new monotonicity-constrained maximum likelihood approach, called Partial Order Optimum Likelihood (POOL), is presented and applied to the problem of functional site prediction in protein 3D structures, an important current challenge in genomics. The input consists of electrostatic and geometric pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Tong, Wenxu, Wei, Ying, Murga, Leonel F., Ondrechen, Mary Jo, Williams, Ronald J.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2612599/
https://www.ncbi.nlm.nih.gov/pubmed/19148270
http://dx.doi.org/10.1371/journal.pcbi.1000266
_version_ 1782163122440634368
author Tong, Wenxu
Wei, Ying
Murga, Leonel F.
Ondrechen, Mary Jo
Williams, Ronald J.
author_facet Tong, Wenxu
Wei, Ying
Murga, Leonel F.
Ondrechen, Mary Jo
Williams, Ronald J.
author_sort Tong, Wenxu
collection PubMed
description A new monotonicity-constrained maximum likelihood approach, called Partial Order Optimum Likelihood (POOL), is presented and applied to the problem of functional site prediction in protein 3D structures, an important current challenge in genomics. The input consists of electrostatic and geometric properties derived from the 3D structure of the query protein alone. Sequence-based conservation information, where available, may also be incorporated. Electrostatics features from THEMATICS are combined with multidimensional isotonic regression to form maximum likelihood estimates of probabilities that specific residues belong to an active site. This allows likelihood ranking of all ionizable residues in a given protein based on THEMATICS features. The corresponding ROC curves and statistical significance tests demonstrate that this method outperforms prior THEMATICS-based methods, which in turn have been shown previously to outperform other 3D-structure-based methods for identifying active site residues. Then it is shown that the addition of one simple geometric property, the size rank of the cleft in which a given residue is contained, yields improved performance. Extension of the method to include predictions of non-ionizable residues is achieved through the introduction of environment variables. This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data. Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures. Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers.
format Text
id pubmed-2612599
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-26125992009-01-16 Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties Tong, Wenxu Wei, Ying Murga, Leonel F. Ondrechen, Mary Jo Williams, Ronald J. PLoS Comput Biol Research Article A new monotonicity-constrained maximum likelihood approach, called Partial Order Optimum Likelihood (POOL), is presented and applied to the problem of functional site prediction in protein 3D structures, an important current challenge in genomics. The input consists of electrostatic and geometric properties derived from the 3D structure of the query protein alone. Sequence-based conservation information, where available, may also be incorporated. Electrostatics features from THEMATICS are combined with multidimensional isotonic regression to form maximum likelihood estimates of probabilities that specific residues belong to an active site. This allows likelihood ranking of all ionizable residues in a given protein based on THEMATICS features. The corresponding ROC curves and statistical significance tests demonstrate that this method outperforms prior THEMATICS-based methods, which in turn have been shown previously to outperform other 3D-structure-based methods for identifying active site residues. Then it is shown that the addition of one simple geometric property, the size rank of the cleft in which a given residue is contained, yields improved performance. Extension of the method to include predictions of non-ionizable residues is achieved through the introduction of environment variables. This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data. Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures. Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers. Public Library of Science 2009-01-16 /pmc/articles/PMC2612599/ /pubmed/19148270 http://dx.doi.org/10.1371/journal.pcbi.1000266 Text en Tong et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Tong, Wenxu
Wei, Ying
Murga, Leonel F.
Ondrechen, Mary Jo
Williams, Ronald J.
Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties
title Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties
title_full Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties
title_fullStr Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties
title_full_unstemmed Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties
title_short Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties
title_sort partial order optimum likelihood (pool): maximum likelihood prediction of protein active site residues using 3d structure and sequence properties
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2612599/
https://www.ncbi.nlm.nih.gov/pubmed/19148270
http://dx.doi.org/10.1371/journal.pcbi.1000266
work_keys_str_mv AT tongwenxu partialorderoptimumlikelihoodpoolmaximumlikelihoodpredictionofproteinactivesiteresiduesusing3dstructureandsequenceproperties
AT weiying partialorderoptimumlikelihoodpoolmaximumlikelihoodpredictionofproteinactivesiteresiduesusing3dstructureandsequenceproperties
AT murgaleonelf partialorderoptimumlikelihoodpoolmaximumlikelihoodpredictionofproteinactivesiteresiduesusing3dstructureandsequenceproperties
AT ondrechenmaryjo partialorderoptimumlikelihoodpoolmaximumlikelihoodpredictionofproteinactivesiteresiduesusing3dstructureandsequenceproperties
AT williamsronaldj partialorderoptimumlikelihoodpoolmaximumlikelihoodpredictionofproteinactivesiteresiduesusing3dstructureandsequenceproperties