Cargando…

Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example

Many computational classifiers have been developed to predict different types of post-translational modification sites. Their performances are measured using cross-validation or independent test, in which experimental data from different sources are mixed and randomly split into training and test se...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zou, Guoyang, Zou, Yang, Ma, Chenglong, Zhao, Jiaojiao, Li, Lei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8687584/ https://www.ncbi.nlm.nih.gov/pubmed/34879076 http://dx.doi.org/10.1371/journal.pcbi.1009682

_version_	1784618203486355456
author	Zou, Guoyang Zou, Yang Ma, Chenglong Zhao, Jiaojiao Li, Lei
author_facet	Zou, Guoyang Zou, Yang Ma, Chenglong Zhao, Jiaojiao Li, Lei
author_sort	Zou, Guoyang
collection	PubMed
description	Many computational classifiers have been developed to predict different types of post-translational modification sites. Their performances are measured using cross-validation or independent test, in which experimental data from different sources are mixed and randomly split into training and test sets. However, the self-reported performances of most classifiers based on this measure are generally higher than their performances in the application of new experimental data. It suggests that the cross-validation method overestimates the generalization ability of a classifier. Here, we proposed a generalization estimate method, dubbed experiment-split test, where the experimental sources for the training set are different from those for the test set that simulate the data derived from a new experiment. We took the prediction of lysine methylome (Kme) as an example and developed a deep learning-based Kme site predictor (called DeepKme) with outstanding performance. We assessed the experiment-split test by comparing it with the cross-validation method. We found that the performance measured using the experiment-split test is lower than that measured in terms of cross-validation. As the test data of the experiment-split method were derived from an independent experimental source, this method could reflect the generalization of the predictor. Therefore, we believe that the experiment-split method can be applied to benchmark the practical performance of a given PTM model. DeepKme is free accessible via https://github.com/guoyangzou/DeepKme.
format	Online Article Text
id	pubmed-8687584
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-86875842021-12-21 Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example Zou, Guoyang Zou, Yang Ma, Chenglong Zhao, Jiaojiao Li, Lei PLoS Comput Biol Research Article Many computational classifiers have been developed to predict different types of post-translational modification sites. Their performances are measured using cross-validation or independent test, in which experimental data from different sources are mixed and randomly split into training and test sets. However, the self-reported performances of most classifiers based on this measure are generally higher than their performances in the application of new experimental data. It suggests that the cross-validation method overestimates the generalization ability of a classifier. Here, we proposed a generalization estimate method, dubbed experiment-split test, where the experimental sources for the training set are different from those for the test set that simulate the data derived from a new experiment. We took the prediction of lysine methylome (Kme) as an example and developed a deep learning-based Kme site predictor (called DeepKme) with outstanding performance. We assessed the experiment-split test by comparing it with the cross-validation method. We found that the performance measured using the experiment-split test is lower than that measured in terms of cross-validation. As the test data of the experiment-split method were derived from an independent experimental source, this method could reflect the generalization of the predictor. Therefore, we believe that the experiment-split method can be applied to benchmark the practical performance of a given PTM model. DeepKme is free accessible via https://github.com/guoyangzou/DeepKme. Public Library of Science 2021-12-08 /pmc/articles/PMC8687584/ /pubmed/34879076 http://dx.doi.org/10.1371/journal.pcbi.1009682 Text en © 2021 Zou et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Zou, Guoyang Zou, Yang Ma, Chenglong Zhao, Jiaojiao Li, Lei Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example
title	Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example
title_full	Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example
title_fullStr	Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example
title_full_unstemmed	Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example
title_short	Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example
title_sort	development of an experiment-split method for benchmarking the generalization of a ptm site predictor: lysine methylome as an example
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8687584/ https://www.ncbi.nlm.nih.gov/pubmed/34879076 http://dx.doi.org/10.1371/journal.pcbi.1009682
work_keys_str_mv	AT zouguoyang developmentofanexperimentsplitmethodforbenchmarkingthegeneralizationofaptmsitepredictorlysinemethylomeasanexample AT zouyang developmentofanexperimentsplitmethodforbenchmarkingthegeneralizationofaptmsitepredictorlysinemethylomeasanexample AT machenglong developmentofanexperimentsplitmethodforbenchmarkingthegeneralizationofaptmsitepredictorlysinemethylomeasanexample AT zhaojiaojiao developmentofanexperimentsplitmethodforbenchmarkingthegeneralizationofaptmsitepredictorlysinemethylomeasanexample AT lilei developmentofanexperimentsplitmethodforbenchmarkingthegeneralizationofaptmsitepredictorlysinemethylomeasanexample

Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example

Ejemplares similares