Cargando…
Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example
Many computational classifiers have been developed to predict different types of post-translational modification sites. Their performances are measured using cross-validation or independent test, in which experimental data from different sources are mixed and randomly split into training and test se...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8687584/ https://www.ncbi.nlm.nih.gov/pubmed/34879076 http://dx.doi.org/10.1371/journal.pcbi.1009682 |
_version_ | 1784618203486355456 |
---|---|
author | Zou, Guoyang Zou, Yang Ma, Chenglong Zhao, Jiaojiao Li, Lei |
author_facet | Zou, Guoyang Zou, Yang Ma, Chenglong Zhao, Jiaojiao Li, Lei |
author_sort | Zou, Guoyang |
collection | PubMed |
description | Many computational classifiers have been developed to predict different types of post-translational modification sites. Their performances are measured using cross-validation or independent test, in which experimental data from different sources are mixed and randomly split into training and test sets. However, the self-reported performances of most classifiers based on this measure are generally higher than their performances in the application of new experimental data. It suggests that the cross-validation method overestimates the generalization ability of a classifier. Here, we proposed a generalization estimate method, dubbed experiment-split test, where the experimental sources for the training set are different from those for the test set that simulate the data derived from a new experiment. We took the prediction of lysine methylome (Kme) as an example and developed a deep learning-based Kme site predictor (called DeepKme) with outstanding performance. We assessed the experiment-split test by comparing it with the cross-validation method. We found that the performance measured using the experiment-split test is lower than that measured in terms of cross-validation. As the test data of the experiment-split method were derived from an independent experimental source, this method could reflect the generalization of the predictor. Therefore, we believe that the experiment-split method can be applied to benchmark the practical performance of a given PTM model. DeepKme is free accessible via https://github.com/guoyangzou/DeepKme. |
format | Online Article Text |
id | pubmed-8687584 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-86875842021-12-21 Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example Zou, Guoyang Zou, Yang Ma, Chenglong Zhao, Jiaojiao Li, Lei PLoS Comput Biol Research Article Many computational classifiers have been developed to predict different types of post-translational modification sites. Their performances are measured using cross-validation or independent test, in which experimental data from different sources are mixed and randomly split into training and test sets. However, the self-reported performances of most classifiers based on this measure are generally higher than their performances in the application of new experimental data. It suggests that the cross-validation method overestimates the generalization ability of a classifier. Here, we proposed a generalization estimate method, dubbed experiment-split test, where the experimental sources for the training set are different from those for the test set that simulate the data derived from a new experiment. We took the prediction of lysine methylome (Kme) as an example and developed a deep learning-based Kme site predictor (called DeepKme) with outstanding performance. We assessed the experiment-split test by comparing it with the cross-validation method. We found that the performance measured using the experiment-split test is lower than that measured in terms of cross-validation. As the test data of the experiment-split method were derived from an independent experimental source, this method could reflect the generalization of the predictor. Therefore, we believe that the experiment-split method can be applied to benchmark the practical performance of a given PTM model. DeepKme is free accessible via https://github.com/guoyangzou/DeepKme. Public Library of Science 2021-12-08 /pmc/articles/PMC8687584/ /pubmed/34879076 http://dx.doi.org/10.1371/journal.pcbi.1009682 Text en © 2021 Zou et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Zou, Guoyang Zou, Yang Ma, Chenglong Zhao, Jiaojiao Li, Lei Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example |
title | Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example |
title_full | Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example |
title_fullStr | Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example |
title_full_unstemmed | Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example |
title_short | Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example |
title_sort | development of an experiment-split method for benchmarking the generalization of a ptm site predictor: lysine methylome as an example |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8687584/ https://www.ncbi.nlm.nih.gov/pubmed/34879076 http://dx.doi.org/10.1371/journal.pcbi.1009682 |
work_keys_str_mv | AT zouguoyang developmentofanexperimentsplitmethodforbenchmarkingthegeneralizationofaptmsitepredictorlysinemethylomeasanexample AT zouyang developmentofanexperimentsplitmethodforbenchmarkingthegeneralizationofaptmsitepredictorlysinemethylomeasanexample AT machenglong developmentofanexperimentsplitmethodforbenchmarkingthegeneralizationofaptmsitepredictorlysinemethylomeasanexample AT zhaojiaojiao developmentofanexperimentsplitmethodforbenchmarkingthegeneralizationofaptmsitepredictorlysinemethylomeasanexample AT lilei developmentofanexperimentsplitmethodforbenchmarkingthegeneralizationofaptmsitepredictorlysinemethylomeasanexample |