Cargando…

Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example

Many computational classifiers have been developed to predict different types of post-translational modification sites. Their performances are measured using cross-validation or independent test, in which experimental data from different sources are mixed and randomly split into training and test se...

Descripción completa

Detalles Bibliográficos
Autores principales: Zou, Guoyang, Zou, Yang, Ma, Chenglong, Zhao, Jiaojiao, Li, Lei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8687584/
https://www.ncbi.nlm.nih.gov/pubmed/34879076
http://dx.doi.org/10.1371/journal.pcbi.1009682
_version_ 1784618203486355456
author Zou, Guoyang
Zou, Yang
Ma, Chenglong
Zhao, Jiaojiao
Li, Lei
author_facet Zou, Guoyang
Zou, Yang
Ma, Chenglong
Zhao, Jiaojiao
Li, Lei
author_sort Zou, Guoyang
collection PubMed
description Many computational classifiers have been developed to predict different types of post-translational modification sites. Their performances are measured using cross-validation or independent test, in which experimental data from different sources are mixed and randomly split into training and test sets. However, the self-reported performances of most classifiers based on this measure are generally higher than their performances in the application of new experimental data. It suggests that the cross-validation method overestimates the generalization ability of a classifier. Here, we proposed a generalization estimate method, dubbed experiment-split test, where the experimental sources for the training set are different from those for the test set that simulate the data derived from a new experiment. We took the prediction of lysine methylome (Kme) as an example and developed a deep learning-based Kme site predictor (called DeepKme) with outstanding performance. We assessed the experiment-split test by comparing it with the cross-validation method. We found that the performance measured using the experiment-split test is lower than that measured in terms of cross-validation. As the test data of the experiment-split method were derived from an independent experimental source, this method could reflect the generalization of the predictor. Therefore, we believe that the experiment-split method can be applied to benchmark the practical performance of a given PTM model. DeepKme is free accessible via https://github.com/guoyangzou/DeepKme.
format Online
Article
Text
id pubmed-8687584
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-86875842021-12-21 Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example Zou, Guoyang Zou, Yang Ma, Chenglong Zhao, Jiaojiao Li, Lei PLoS Comput Biol Research Article Many computational classifiers have been developed to predict different types of post-translational modification sites. Their performances are measured using cross-validation or independent test, in which experimental data from different sources are mixed and randomly split into training and test sets. However, the self-reported performances of most classifiers based on this measure are generally higher than their performances in the application of new experimental data. It suggests that the cross-validation method overestimates the generalization ability of a classifier. Here, we proposed a generalization estimate method, dubbed experiment-split test, where the experimental sources for the training set are different from those for the test set that simulate the data derived from a new experiment. We took the prediction of lysine methylome (Kme) as an example and developed a deep learning-based Kme site predictor (called DeepKme) with outstanding performance. We assessed the experiment-split test by comparing it with the cross-validation method. We found that the performance measured using the experiment-split test is lower than that measured in terms of cross-validation. As the test data of the experiment-split method were derived from an independent experimental source, this method could reflect the generalization of the predictor. Therefore, we believe that the experiment-split method can be applied to benchmark the practical performance of a given PTM model. DeepKme is free accessible via https://github.com/guoyangzou/DeepKme. Public Library of Science 2021-12-08 /pmc/articles/PMC8687584/ /pubmed/34879076 http://dx.doi.org/10.1371/journal.pcbi.1009682 Text en © 2021 Zou et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zou, Guoyang
Zou, Yang
Ma, Chenglong
Zhao, Jiaojiao
Li, Lei
Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example
title Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example
title_full Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example
title_fullStr Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example
title_full_unstemmed Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example
title_short Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example
title_sort development of an experiment-split method for benchmarking the generalization of a ptm site predictor: lysine methylome as an example
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8687584/
https://www.ncbi.nlm.nih.gov/pubmed/34879076
http://dx.doi.org/10.1371/journal.pcbi.1009682
work_keys_str_mv AT zouguoyang developmentofanexperimentsplitmethodforbenchmarkingthegeneralizationofaptmsitepredictorlysinemethylomeasanexample
AT zouyang developmentofanexperimentsplitmethodforbenchmarkingthegeneralizationofaptmsitepredictorlysinemethylomeasanexample
AT machenglong developmentofanexperimentsplitmethodforbenchmarkingthegeneralizationofaptmsitepredictorlysinemethylomeasanexample
AT zhaojiaojiao developmentofanexperimentsplitmethodforbenchmarkingthegeneralizationofaptmsitepredictorlysinemethylomeasanexample
AT lilei developmentofanexperimentsplitmethodforbenchmarkingthegeneralizationofaptmsitepredictorlysinemethylomeasanexample