Cargando…

Supervised learning is an accurate method for network-based gene classification

BACKGROUND: Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene a...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Renming, Mancuso, Christopher A, Yannakopoulos, Anna, Johnson, Kayla A, Krishnan, Arjun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7267831/
https://www.ncbi.nlm.nih.gov/pubmed/32129827
http://dx.doi.org/10.1093/bioinformatics/btaa150
_version_ 1783541485981925376
author Liu, Renming
Mancuso, Christopher A
Yannakopoulos, Anna
Johnson, Kayla A
Krishnan, Arjun
author_facet Liu, Renming
Mancuso, Christopher A
Yannakopoulos, Anna
Johnson, Kayla A
Krishnan, Arjun
author_sort Liu, Renming
collection PubMed
description BACKGROUND: Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem. RESULTS: In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene’s full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation’s appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows. AVAILABILITY AND IMPLEMENTATION: The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available. CONTACT: arjun@msu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7267831
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-72678312020-06-09 Supervised learning is an accurate method for network-based gene classification Liu, Renming Mancuso, Christopher A Yannakopoulos, Anna Johnson, Kayla A Krishnan, Arjun Bioinformatics Original Papers BACKGROUND: Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem. RESULTS: In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene’s full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation’s appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows. AVAILABILITY AND IMPLEMENTATION: The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available. CONTACT: arjun@msu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-06 2020-04-14 /pmc/articles/PMC7267831/ /pubmed/32129827 http://dx.doi.org/10.1093/bioinformatics/btaa150 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Liu, Renming
Mancuso, Christopher A
Yannakopoulos, Anna
Johnson, Kayla A
Krishnan, Arjun
Supervised learning is an accurate method for network-based gene classification
title Supervised learning is an accurate method for network-based gene classification
title_full Supervised learning is an accurate method for network-based gene classification
title_fullStr Supervised learning is an accurate method for network-based gene classification
title_full_unstemmed Supervised learning is an accurate method for network-based gene classification
title_short Supervised learning is an accurate method for network-based gene classification
title_sort supervised learning is an accurate method for network-based gene classification
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7267831/
https://www.ncbi.nlm.nih.gov/pubmed/32129827
http://dx.doi.org/10.1093/bioinformatics/btaa150
work_keys_str_mv AT liurenming supervisedlearningisanaccuratemethodfornetworkbasedgeneclassification
AT mancusochristophera supervisedlearningisanaccuratemethodfornetworkbasedgeneclassification
AT yannakopoulosanna supervisedlearningisanaccuratemethodfornetworkbasedgeneclassification
AT johnsonkaylaa supervisedlearningisanaccuratemethodfornetworkbasedgeneclassification
AT krishnanarjun supervisedlearningisanaccuratemethodfornetworkbasedgeneclassification