Cargando…

A systematic benchmark of machine learning methods for protein–RNA interaction prediction

RNA-binding proteins (RBPs) are central actors of RNA post-transcriptional regulation. Experiments to profile-binding sites of RBPs in vivo are limited to transcripts expressed in the experimental cell type, creating the need for computational methods to infer missing binding information. While nume...

Descripción completa

Detalles Bibliográficos
Autores principales:	Horlacher, Marc, Cantini, Giulia, Hesse, Julian, Schinke, Patrick, Goedert, Nicolas, Londhe, Shubhankar, Moyon, Lambert, Marsico, Annalisa
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Review
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516373/ https://www.ncbi.nlm.nih.gov/pubmed/37635383 http://dx.doi.org/10.1093/bib/bbad307

_version_	1785109116058861568
author	Horlacher, Marc Cantini, Giulia Hesse, Julian Schinke, Patrick Goedert, Nicolas Londhe, Shubhankar Moyon, Lambert Marsico, Annalisa
author_facet	Horlacher, Marc Cantini, Giulia Hesse, Julian Schinke, Patrick Goedert, Nicolas Londhe, Shubhankar Moyon, Lambert Marsico, Annalisa
author_sort	Horlacher, Marc
collection	PubMed
description	RNA-binding proteins (RBPs) are central actors of RNA post-transcriptional regulation. Experiments to profile-binding sites of RBPs in vivo are limited to transcripts expressed in the experimental cell type, creating the need for computational methods to infer missing binding information. While numerous machine-learning based methods have been developed for this task, their use of heterogeneous training and evaluation datasets across different sets of RBPs and CLIP-seq protocols makes a direct comparison of their performance difficult. Here, we compile a set of 37 machine learning (primarily deep learning) methods for in vivo RBP–RNA interaction prediction and systematically benchmark a subset of 11 representative methods across hundreds of CLIP-seq datasets and RBPs. Using homogenized sample pre-processing and two negative-class sample generation strategies, we evaluate methods in terms of predictive performance and assess the impact of neural network architectures and input modalities on model performance. We believe that this study will not only enable researchers to choose the optimal prediction method for their tasks at hand, but also aid method developers in developing novel, high-performing methods by introducing a standardized framework for their evaluation.
format	Online Article Text
id	pubmed-10516373
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-105163732023-09-23 A systematic benchmark of machine learning methods for protein–RNA interaction prediction Horlacher, Marc Cantini, Giulia Hesse, Julian Schinke, Patrick Goedert, Nicolas Londhe, Shubhankar Moyon, Lambert Marsico, Annalisa Brief Bioinform Review RNA-binding proteins (RBPs) are central actors of RNA post-transcriptional regulation. Experiments to profile-binding sites of RBPs in vivo are limited to transcripts expressed in the experimental cell type, creating the need for computational methods to infer missing binding information. While numerous machine-learning based methods have been developed for this task, their use of heterogeneous training and evaluation datasets across different sets of RBPs and CLIP-seq protocols makes a direct comparison of their performance difficult. Here, we compile a set of 37 machine learning (primarily deep learning) methods for in vivo RBP–RNA interaction prediction and systematically benchmark a subset of 11 representative methods across hundreds of CLIP-seq datasets and RBPs. Using homogenized sample pre-processing and two negative-class sample generation strategies, we evaluate methods in terms of predictive performance and assess the impact of neural network architectures and input modalities on model performance. We believe that this study will not only enable researchers to choose the optimal prediction method for their tasks at hand, but also aid method developers in developing novel, high-performing methods by introducing a standardized framework for their evaluation. Oxford University Press 2023-08-26 /pmc/articles/PMC10516373/ /pubmed/37635383 http://dx.doi.org/10.1093/bib/bbad307 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Review Horlacher, Marc Cantini, Giulia Hesse, Julian Schinke, Patrick Goedert, Nicolas Londhe, Shubhankar Moyon, Lambert Marsico, Annalisa A systematic benchmark of machine learning methods for protein–RNA interaction prediction
title	A systematic benchmark of machine learning methods for protein–RNA interaction prediction
title_full	A systematic benchmark of machine learning methods for protein–RNA interaction prediction
title_fullStr	A systematic benchmark of machine learning methods for protein–RNA interaction prediction
title_full_unstemmed	A systematic benchmark of machine learning methods for protein–RNA interaction prediction
title_short	A systematic benchmark of machine learning methods for protein–RNA interaction prediction
title_sort	systematic benchmark of machine learning methods for protein–rna interaction prediction
topic	Review
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516373/ https://www.ncbi.nlm.nih.gov/pubmed/37635383 http://dx.doi.org/10.1093/bib/bbad307
work_keys_str_mv	AT horlachermarc asystematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction AT cantinigiulia asystematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction AT hessejulian asystematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction AT schinkepatrick asystematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction AT goedertnicolas asystematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction AT londheshubhankar asystematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction AT moyonlambert asystematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction AT marsicoannalisa asystematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction AT horlachermarc systematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction AT cantinigiulia systematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction AT hessejulian systematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction AT schinkepatrick systematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction AT goedertnicolas systematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction AT londheshubhankar systematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction AT moyonlambert systematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction AT marsicoannalisa systematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction

A systematic benchmark of machine learning methods for protein–RNA interaction prediction

Ejemplares similares