Cargando…

DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies

Although rapid progress has been made in computational approaches for prioritizing cancer driver genes, research is far from achieving the ultimate goal of discovering a complete catalog of genes truly associated with cancer. Driver gene lists predicted from these computational tools lack consistenc...

Descripción completa

Detalles Bibliográficos
Autores principales: Han, Yi, Yang, Juze, Qian, Xinyi, Cheng, Wei-Chung, Liu, Shu-Hsuan, Hua, Xing, Zhou, Liyuan, Yang, Yaning, Wu, Qingbiao, Liu, Pengyuan, Lu, Yan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6486576/
https://www.ncbi.nlm.nih.gov/pubmed/30773592
http://dx.doi.org/10.1093/nar/gkz096
_version_ 1783414364530802688
author Han, Yi
Yang, Juze
Qian, Xinyi
Cheng, Wei-Chung
Liu, Shu-Hsuan
Hua, Xing
Zhou, Liyuan
Yang, Yaning
Wu, Qingbiao
Liu, Pengyuan
Lu, Yan
author_facet Han, Yi
Yang, Juze
Qian, Xinyi
Cheng, Wei-Chung
Liu, Shu-Hsuan
Hua, Xing
Zhou, Liyuan
Yang, Yaning
Wu, Qingbiao
Liu, Pengyuan
Lu, Yan
author_sort Han, Yi
collection PubMed
description Although rapid progress has been made in computational approaches for prioritizing cancer driver genes, research is far from achieving the ultimate goal of discovering a complete catalog of genes truly associated with cancer. Driver gene lists predicted from these computational tools lack consistency and are prone to false positives. Here, we developed an approach (DriverML) integrating Rao’s score test and supervised machine learning to identify cancer driver genes. The weight parameters in the score statistics quantified the functional impacts of mutations on the protein. To obtain optimized weight parameters, the score statistics of prior driver genes were maximized on pan-cancer training data. We conducted rigorous and unbiased benchmark analysis and comparisons of DriverML with 20 other existing tools in 31 independent datasets from The Cancer Genome Atlas (TCGA). Our comprehensive evaluations demonstrated that DriverML was robust and powerful among various datasets and outperformed the other tools with a better balance of precision and sensitivity. In vitro cell-based assays further proved the validity of the DriverML prediction of novel driver genes. In summary, DriverML uses an innovative, machine learning-based approach to prioritize cancer driver genes and provides dramatic improvements over currently existing methods. Its source code is available at https://github.com/HelloYiHan/DriverML.
format Online
Article
Text
id pubmed-6486576
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-64865762019-05-01 DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies Han, Yi Yang, Juze Qian, Xinyi Cheng, Wei-Chung Liu, Shu-Hsuan Hua, Xing Zhou, Liyuan Yang, Yaning Wu, Qingbiao Liu, Pengyuan Lu, Yan Nucleic Acids Res Methods Online Although rapid progress has been made in computational approaches for prioritizing cancer driver genes, research is far from achieving the ultimate goal of discovering a complete catalog of genes truly associated with cancer. Driver gene lists predicted from these computational tools lack consistency and are prone to false positives. Here, we developed an approach (DriverML) integrating Rao’s score test and supervised machine learning to identify cancer driver genes. The weight parameters in the score statistics quantified the functional impacts of mutations on the protein. To obtain optimized weight parameters, the score statistics of prior driver genes were maximized on pan-cancer training data. We conducted rigorous and unbiased benchmark analysis and comparisons of DriverML with 20 other existing tools in 31 independent datasets from The Cancer Genome Atlas (TCGA). Our comprehensive evaluations demonstrated that DriverML was robust and powerful among various datasets and outperformed the other tools with a better balance of precision and sensitivity. In vitro cell-based assays further proved the validity of the DriverML prediction of novel driver genes. In summary, DriverML uses an innovative, machine learning-based approach to prioritize cancer driver genes and provides dramatic improvements over currently existing methods. Its source code is available at https://github.com/HelloYiHan/DriverML. Oxford University Press 2019-05-07 2019-02-18 /pmc/articles/PMC6486576/ /pubmed/30773592 http://dx.doi.org/10.1093/nar/gkz096 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Han, Yi
Yang, Juze
Qian, Xinyi
Cheng, Wei-Chung
Liu, Shu-Hsuan
Hua, Xing
Zhou, Liyuan
Yang, Yaning
Wu, Qingbiao
Liu, Pengyuan
Lu, Yan
DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies
title DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies
title_full DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies
title_fullStr DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies
title_full_unstemmed DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies
title_short DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies
title_sort driverml: a machine learning algorithm for identifying driver genes in cancer sequencing studies
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6486576/
https://www.ncbi.nlm.nih.gov/pubmed/30773592
http://dx.doi.org/10.1093/nar/gkz096
work_keys_str_mv AT hanyi drivermlamachinelearningalgorithmforidentifyingdrivergenesincancersequencingstudies
AT yangjuze drivermlamachinelearningalgorithmforidentifyingdrivergenesincancersequencingstudies
AT qianxinyi drivermlamachinelearningalgorithmforidentifyingdrivergenesincancersequencingstudies
AT chengweichung drivermlamachinelearningalgorithmforidentifyingdrivergenesincancersequencingstudies
AT liushuhsuan drivermlamachinelearningalgorithmforidentifyingdrivergenesincancersequencingstudies
AT huaxing drivermlamachinelearningalgorithmforidentifyingdrivergenesincancersequencingstudies
AT zhouliyuan drivermlamachinelearningalgorithmforidentifyingdrivergenesincancersequencingstudies
AT yangyaning drivermlamachinelearningalgorithmforidentifyingdrivergenesincancersequencingstudies
AT wuqingbiao drivermlamachinelearningalgorithmforidentifyingdrivergenesincancersequencingstudies
AT liupengyuan drivermlamachinelearningalgorithmforidentifyingdrivergenesincancersequencingstudies
AT luyan drivermlamachinelearningalgorithmforidentifyingdrivergenesincancersequencingstudies