Cargando…

Assessing predictors for new post translational modification sites: A case study on hydroxylation

Post-translational modification (PTM) sites have become popular for predictor development. However, with the exception of phosphorylation and a handful of other examples, PTMs suffer from a limited number of available training examples and sparsity in protein sequences. Here, proline hydroxylation i...

Descripción completa

Detalles Bibliográficos
Autores principales: Piovesan, Damiano, Hatos, Andras, Minervini, Giovanni, Quaglia, Federica, Monzon, Alexander Miguel, Tosatto, Silvio C. E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7332089/
https://www.ncbi.nlm.nih.gov/pubmed/32569263
http://dx.doi.org/10.1371/journal.pcbi.1007967
_version_ 1783553459667075072
author Piovesan, Damiano
Hatos, Andras
Minervini, Giovanni
Quaglia, Federica
Monzon, Alexander Miguel
Tosatto, Silvio C. E.
author_facet Piovesan, Damiano
Hatos, Andras
Minervini, Giovanni
Quaglia, Federica
Monzon, Alexander Miguel
Tosatto, Silvio C. E.
author_sort Piovesan, Damiano
collection PubMed
description Post-translational modification (PTM) sites have become popular for predictor development. However, with the exception of phosphorylation and a handful of other examples, PTMs suffer from a limited number of available training examples and sparsity in protein sequences. Here, proline hydroxylation is taken as an example to compare different methods and evaluate their performance on new experimentally determined sites. As a guide for effective experimental design, predictors require both high specificity and sensitivity. However, the self-reported performance may often not be indicative of prediction quality and detection of new sites is not guaranteed. We have benchmarked seven published hydroxylation site predictors on two newly constructed independent datasets. The self-reported performance is found to widely overestimate the real accuracy measured on independent datasets. No predictor performs better than random on new examples, indicating the refined models do not sufficiently generalize to detect new sites. The number of false positives is high and precision low, in particular for non-collagen proteins whose motifs are not conserved. As hydroxylation site predictors do not generalize for new data, caution is advised when using PTM predictors in the absence of independent evaluations, in particular for highly specific sites involved in signalling.
format Online
Article
Text
id pubmed-7332089
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-73320892020-07-15 Assessing predictors for new post translational modification sites: A case study on hydroxylation Piovesan, Damiano Hatos, Andras Minervini, Giovanni Quaglia, Federica Monzon, Alexander Miguel Tosatto, Silvio C. E. PLoS Comput Biol Research Article Post-translational modification (PTM) sites have become popular for predictor development. However, with the exception of phosphorylation and a handful of other examples, PTMs suffer from a limited number of available training examples and sparsity in protein sequences. Here, proline hydroxylation is taken as an example to compare different methods and evaluate their performance on new experimentally determined sites. As a guide for effective experimental design, predictors require both high specificity and sensitivity. However, the self-reported performance may often not be indicative of prediction quality and detection of new sites is not guaranteed. We have benchmarked seven published hydroxylation site predictors on two newly constructed independent datasets. The self-reported performance is found to widely overestimate the real accuracy measured on independent datasets. No predictor performs better than random on new examples, indicating the refined models do not sufficiently generalize to detect new sites. The number of false positives is high and precision low, in particular for non-collagen proteins whose motifs are not conserved. As hydroxylation site predictors do not generalize for new data, caution is advised when using PTM predictors in the absence of independent evaluations, in particular for highly specific sites involved in signalling. Public Library of Science 2020-06-22 /pmc/articles/PMC7332089/ /pubmed/32569263 http://dx.doi.org/10.1371/journal.pcbi.1007967 Text en © 2020 Piovesan et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Piovesan, Damiano
Hatos, Andras
Minervini, Giovanni
Quaglia, Federica
Monzon, Alexander Miguel
Tosatto, Silvio C. E.
Assessing predictors for new post translational modification sites: A case study on hydroxylation
title Assessing predictors for new post translational modification sites: A case study on hydroxylation
title_full Assessing predictors for new post translational modification sites: A case study on hydroxylation
title_fullStr Assessing predictors for new post translational modification sites: A case study on hydroxylation
title_full_unstemmed Assessing predictors for new post translational modification sites: A case study on hydroxylation
title_short Assessing predictors for new post translational modification sites: A case study on hydroxylation
title_sort assessing predictors for new post translational modification sites: a case study on hydroxylation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7332089/
https://www.ncbi.nlm.nih.gov/pubmed/32569263
http://dx.doi.org/10.1371/journal.pcbi.1007967
work_keys_str_mv AT piovesandamiano assessingpredictorsfornewposttranslationalmodificationsitesacasestudyonhydroxylation
AT hatosandras assessingpredictorsfornewposttranslationalmodificationsitesacasestudyonhydroxylation
AT minervinigiovanni assessingpredictorsfornewposttranslationalmodificationsitesacasestudyonhydroxylation
AT quagliafederica assessingpredictorsfornewposttranslationalmodificationsitesacasestudyonhydroxylation
AT monzonalexandermiguel assessingpredictorsfornewposttranslationalmodificationsitesacasestudyonhydroxylation
AT tosattosilvioce assessingpredictorsfornewposttranslationalmodificationsitesacasestudyonhydroxylation