Cargando…

A systematic benchmark of machine learning methods for protein–RNA interaction prediction

RNA-binding proteins (RBPs) are central actors of RNA post-transcriptional regulation. Experiments to profile-binding sites of RBPs in vivo are limited to transcripts expressed in the experimental cell type, creating the need for computational methods to infer missing binding information. While nume...

Descripción completa

Detalles Bibliográficos
Autores principales: Horlacher, Marc, Cantini, Giulia, Hesse, Julian, Schinke, Patrick, Goedert, Nicolas, Londhe, Shubhankar, Moyon, Lambert, Marsico, Annalisa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516373/
https://www.ncbi.nlm.nih.gov/pubmed/37635383
http://dx.doi.org/10.1093/bib/bbad307
_version_ 1785109116058861568
author Horlacher, Marc
Cantini, Giulia
Hesse, Julian
Schinke, Patrick
Goedert, Nicolas
Londhe, Shubhankar
Moyon, Lambert
Marsico, Annalisa
author_facet Horlacher, Marc
Cantini, Giulia
Hesse, Julian
Schinke, Patrick
Goedert, Nicolas
Londhe, Shubhankar
Moyon, Lambert
Marsico, Annalisa
author_sort Horlacher, Marc
collection PubMed
description RNA-binding proteins (RBPs) are central actors of RNA post-transcriptional regulation. Experiments to profile-binding sites of RBPs in vivo are limited to transcripts expressed in the experimental cell type, creating the need for computational methods to infer missing binding information. While numerous machine-learning based methods have been developed for this task, their use of heterogeneous training and evaluation datasets across different sets of RBPs and CLIP-seq protocols makes a direct comparison of their performance difficult. Here, we compile a set of 37 machine learning (primarily deep learning) methods for in vivo RBP–RNA interaction prediction and systematically benchmark a subset of 11 representative methods across hundreds of CLIP-seq datasets and RBPs. Using homogenized sample pre-processing and two negative-class sample generation strategies, we evaluate methods in terms of predictive performance and assess the impact of neural network architectures and input modalities on model performance. We believe that this study will not only enable researchers to choose the optimal prediction method for their tasks at hand, but also aid method developers in developing novel, high-performing methods by introducing a standardized framework for their evaluation.
format Online
Article
Text
id pubmed-10516373
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-105163732023-09-23 A systematic benchmark of machine learning methods for protein–RNA interaction prediction Horlacher, Marc Cantini, Giulia Hesse, Julian Schinke, Patrick Goedert, Nicolas Londhe, Shubhankar Moyon, Lambert Marsico, Annalisa Brief Bioinform Review RNA-binding proteins (RBPs) are central actors of RNA post-transcriptional regulation. Experiments to profile-binding sites of RBPs in vivo are limited to transcripts expressed in the experimental cell type, creating the need for computational methods to infer missing binding information. While numerous machine-learning based methods have been developed for this task, their use of heterogeneous training and evaluation datasets across different sets of RBPs and CLIP-seq protocols makes a direct comparison of their performance difficult. Here, we compile a set of 37 machine learning (primarily deep learning) methods for in vivo RBP–RNA interaction prediction and systematically benchmark a subset of 11 representative methods across hundreds of CLIP-seq datasets and RBPs. Using homogenized sample pre-processing and two negative-class sample generation strategies, we evaluate methods in terms of predictive performance and assess the impact of neural network architectures and input modalities on model performance. We believe that this study will not only enable researchers to choose the optimal prediction method for their tasks at hand, but also aid method developers in developing novel, high-performing methods by introducing a standardized framework for their evaluation. Oxford University Press 2023-08-26 /pmc/articles/PMC10516373/ /pubmed/37635383 http://dx.doi.org/10.1093/bib/bbad307 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Review
Horlacher, Marc
Cantini, Giulia
Hesse, Julian
Schinke, Patrick
Goedert, Nicolas
Londhe, Shubhankar
Moyon, Lambert
Marsico, Annalisa
A systematic benchmark of machine learning methods for protein–RNA interaction prediction
title A systematic benchmark of machine learning methods for protein–RNA interaction prediction
title_full A systematic benchmark of machine learning methods for protein–RNA interaction prediction
title_fullStr A systematic benchmark of machine learning methods for protein–RNA interaction prediction
title_full_unstemmed A systematic benchmark of machine learning methods for protein–RNA interaction prediction
title_short A systematic benchmark of machine learning methods for protein–RNA interaction prediction
title_sort systematic benchmark of machine learning methods for protein–rna interaction prediction
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516373/
https://www.ncbi.nlm.nih.gov/pubmed/37635383
http://dx.doi.org/10.1093/bib/bbad307
work_keys_str_mv AT horlachermarc asystematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction
AT cantinigiulia asystematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction
AT hessejulian asystematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction
AT schinkepatrick asystematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction
AT goedertnicolas asystematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction
AT londheshubhankar asystematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction
AT moyonlambert asystematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction
AT marsicoannalisa asystematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction
AT horlachermarc systematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction
AT cantinigiulia systematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction
AT hessejulian systematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction
AT schinkepatrick systematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction
AT goedertnicolas systematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction
AT londheshubhankar systematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction
AT moyonlambert systematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction
AT marsicoannalisa systematicbenchmarkofmachinelearningmethodsforproteinrnainteractionprediction