Cargando…

Current cancer driver variant predictors learn to recognize driver genes instead of functional variants

BACKGROUND: Identifying variants that drive tumor progression (driver variants) and distinguishing these from variants that are a byproduct of the uncontrolled cell growth in cancer (passenger variants) is a crucial step for understanding tumorigenesis and precision oncology. Various bioinformatics...

Descripción completa

Detalles Bibliográficos
Autores principales:	Raimondi, Daniele, Passemiers, Antoine, Fariselli, Piero, Moreau, Yves
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7807764/ https://www.ncbi.nlm.nih.gov/pubmed/33441128 http://dx.doi.org/10.1186/s12915-020-00930-0

_version_	1783636812686688256
author	Raimondi, Daniele Passemiers, Antoine Fariselli, Piero Moreau, Yves
author_facet	Raimondi, Daniele Passemiers, Antoine Fariselli, Piero Moreau, Yves
author_sort	Raimondi, Daniele
collection	PubMed
description	BACKGROUND: Identifying variants that drive tumor progression (driver variants) and distinguishing these from variants that are a byproduct of the uncontrolled cell growth in cancer (passenger variants) is a crucial step for understanding tumorigenesis and precision oncology. Various bioinformatics methods have attempted to solve this complex task. RESULTS: In this study, we investigate the assumptions on which these methods are based, showing that the different definitions of driver and passenger variants influence the difficulty of the prediction task. More importantly, we prove that the data sets have a construction bias which prevents the machine learning (ML) methods to actually learn variant-level functional effects, despite their excellent performance. This effect results from the fact that in these data sets, the driver variants map to a few driver genes, while the passenger variants spread across thousands of genes, and thus just learning to recognize driver genes provides almost perfect predictions. CONCLUSIONS: To mitigate this issue, we propose a novel data set that minimizes this bias by ensuring that all genes covered by the data contain both driver and passenger variants. As a result, we show that the tested predictors experience a significant drop in performance, which should not be considered as poorer modeling, but rather as correcting unwarranted optimism. Finally, we propose a weighting procedure to completely eliminate the gene effects on such predictions, thus precisely evaluating the ability of predictors to model the functional effects of single variants, and we show that indeed this task is still open. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12915-020-00930-0).
format	Online Article Text
id	pubmed-7807764
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-78077642021-01-15 Current cancer driver variant predictors learn to recognize driver genes instead of functional variants Raimondi, Daniele Passemiers, Antoine Fariselli, Piero Moreau, Yves BMC Biol Research Article BACKGROUND: Identifying variants that drive tumor progression (driver variants) and distinguishing these from variants that are a byproduct of the uncontrolled cell growth in cancer (passenger variants) is a crucial step for understanding tumorigenesis and precision oncology. Various bioinformatics methods have attempted to solve this complex task. RESULTS: In this study, we investigate the assumptions on which these methods are based, showing that the different definitions of driver and passenger variants influence the difficulty of the prediction task. More importantly, we prove that the data sets have a construction bias which prevents the machine learning (ML) methods to actually learn variant-level functional effects, despite their excellent performance. This effect results from the fact that in these data sets, the driver variants map to a few driver genes, while the passenger variants spread across thousands of genes, and thus just learning to recognize driver genes provides almost perfect predictions. CONCLUSIONS: To mitigate this issue, we propose a novel data set that minimizes this bias by ensuring that all genes covered by the data contain both driver and passenger variants. As a result, we show that the tested predictors experience a significant drop in performance, which should not be considered as poorer modeling, but rather as correcting unwarranted optimism. Finally, we propose a weighting procedure to completely eliminate the gene effects on such predictions, thus precisely evaluating the ability of predictors to model the functional effects of single variants, and we show that indeed this task is still open. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12915-020-00930-0). BioMed Central 2021-01-13 /pmc/articles/PMC7807764/ /pubmed/33441128 http://dx.doi.org/10.1186/s12915-020-00930-0 Text en © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Article Raimondi, Daniele Passemiers, Antoine Fariselli, Piero Moreau, Yves Current cancer driver variant predictors learn to recognize driver genes instead of functional variants
title	Current cancer driver variant predictors learn to recognize driver genes instead of functional variants
title_full	Current cancer driver variant predictors learn to recognize driver genes instead of functional variants
title_fullStr	Current cancer driver variant predictors learn to recognize driver genes instead of functional variants
title_full_unstemmed	Current cancer driver variant predictors learn to recognize driver genes instead of functional variants
title_short	Current cancer driver variant predictors learn to recognize driver genes instead of functional variants
title_sort	current cancer driver variant predictors learn to recognize driver genes instead of functional variants
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7807764/ https://www.ncbi.nlm.nih.gov/pubmed/33441128 http://dx.doi.org/10.1186/s12915-020-00930-0
work_keys_str_mv	AT raimondidaniele currentcancerdrivervariantpredictorslearntorecognizedrivergenesinsteadoffunctionalvariants AT passemiersantoine currentcancerdrivervariantpredictorslearntorecognizedrivergenesinsteadoffunctionalvariants AT farisellipiero currentcancerdrivervariantpredictorslearntorecognizedrivergenesinsteadoffunctionalvariants AT moreauyves currentcancerdrivervariantpredictorslearntorecognizedrivergenesinsteadoffunctionalvariants

Current cancer driver variant predictors learn to recognize driver genes instead of functional variants

Ejemplares similares