Cargando…
Supervised learning is an accurate method for network-based gene classification
BACKGROUND: Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene a...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7267831/ https://www.ncbi.nlm.nih.gov/pubmed/32129827 http://dx.doi.org/10.1093/bioinformatics/btaa150 |
_version_ | 1783541485981925376 |
---|---|
author | Liu, Renming Mancuso, Christopher A Yannakopoulos, Anna Johnson, Kayla A Krishnan, Arjun |
author_facet | Liu, Renming Mancuso, Christopher A Yannakopoulos, Anna Johnson, Kayla A Krishnan, Arjun |
author_sort | Liu, Renming |
collection | PubMed |
description | BACKGROUND: Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem. RESULTS: In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene’s full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation’s appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows. AVAILABILITY AND IMPLEMENTATION: The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available. CONTACT: arjun@msu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-7267831 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-72678312020-06-09 Supervised learning is an accurate method for network-based gene classification Liu, Renming Mancuso, Christopher A Yannakopoulos, Anna Johnson, Kayla A Krishnan, Arjun Bioinformatics Original Papers BACKGROUND: Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem. RESULTS: In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene’s full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation’s appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows. AVAILABILITY AND IMPLEMENTATION: The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available. CONTACT: arjun@msu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-06 2020-04-14 /pmc/articles/PMC7267831/ /pubmed/32129827 http://dx.doi.org/10.1093/bioinformatics/btaa150 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Liu, Renming Mancuso, Christopher A Yannakopoulos, Anna Johnson, Kayla A Krishnan, Arjun Supervised learning is an accurate method for network-based gene classification |
title | Supervised learning is an accurate method for network-based gene classification |
title_full | Supervised learning is an accurate method for network-based gene classification |
title_fullStr | Supervised learning is an accurate method for network-based gene classification |
title_full_unstemmed | Supervised learning is an accurate method for network-based gene classification |
title_short | Supervised learning is an accurate method for network-based gene classification |
title_sort | supervised learning is an accurate method for network-based gene classification |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7267831/ https://www.ncbi.nlm.nih.gov/pubmed/32129827 http://dx.doi.org/10.1093/bioinformatics/btaa150 |
work_keys_str_mv | AT liurenming supervisedlearningisanaccuratemethodfornetworkbasedgeneclassification AT mancusochristophera supervisedlearningisanaccuratemethodfornetworkbasedgeneclassification AT yannakopoulosanna supervisedlearningisanaccuratemethodfornetworkbasedgeneclassification AT johnsonkaylaa supervisedlearningisanaccuratemethodfornetworkbasedgeneclassification AT krishnanarjun supervisedlearningisanaccuratemethodfornetworkbasedgeneclassification |