Cargando…

Identifying Cancer Drivers Using DRIVE: A Feature-Based Machine Learning Model for a Pan-Cancer Assessment of Somatic Missense Mutations

SIMPLE SUMMARY: Genes dictate the grounds of life by comprising molecular bases which encode proteins. A mutation represents a gene modification that may influence the protein function. Cancer occurs when the mutation triggers uncontrolled cellular growth. Judging by the cancer expansion, mutations...

Descripción completa

Detalles Bibliográficos
Autores principales: Dragomir, Ionut, Akbar, Adnan, Cassidy, John W., Patel, Nirmesh, Clifford, Harry W., Contino, Gianmarco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8199862/
https://www.ncbi.nlm.nih.gov/pubmed/34205004
http://dx.doi.org/10.3390/cancers13112779
_version_ 1783707474948259840
author Dragomir, Ionut
Akbar, Adnan
Cassidy, John W.
Patel, Nirmesh
Clifford, Harry W.
Contino, Gianmarco
author_facet Dragomir, Ionut
Akbar, Adnan
Cassidy, John W.
Patel, Nirmesh
Clifford, Harry W.
Contino, Gianmarco
author_sort Dragomir, Ionut
collection PubMed
description SIMPLE SUMMARY: Genes dictate the grounds of life by comprising molecular bases which encode proteins. A mutation represents a gene modification that may influence the protein function. Cancer occurs when the mutation triggers uncontrolled cellular growth. Judging by the cancer expansion, mutations labelled as drivers confer a growth advantage, while passengers do not contribute to this augmentation. The aim of this study is methodological, which assesses the usefulness of a classification method for distinguishing between driver and passenger mutations. Based on 51 molecular characteristics of mutations and genes, including 3 novel features, multiple machine learning algorithms were used to determine whether these characteristics biologically represent the driver mutations and how they impact the classification procedure. To test the ability of the present methodology, the same steps were applied to an independent dataset. The results showed that both gene and mutation level characteristics are representative of the driver mutations, and the proposed approach achieved more than 80% accuracy in finding the true type of mutation. The evidence suggests that machine learning methods can be used to gain knowledge from mutational data seeking to deliver more targeted cancer treatment. ABSTRACT: Sporadic cancer develops from the accrual of somatic mutations. Out of all small-scale somatic aberrations in coding regions, 95% are base substitutions, with 90% being missense mutations. While multiple studies focused on the importance of this mutation type, a machine learning method based on the number of protein–protein interactions (PPIs) has not been fully explored. This study aims to develop an improved computational method for driver identification, validation and evaluation (DRIVE), which is compared to other methods for assessing its performance. DRIVE aims at distinguishing between driver and passenger mutations using a feature-based learning approach comprising two levels of biological classification for a pan-cancer assessment of somatic mutations. Gene-level features include the maximum number of protein–protein interactions, the biological process and the type of post-translational modifications (PTMs) while mutation-level features are based on pathogenicity scores. Multiple supervised classification algorithms were trained on Genomics Evidence Neoplasia Information Exchange (GENIE) project data and then tested on an independent dataset from The Cancer Genome Atlas (TCGA) study. Finally, the most powerful classifier using DRIVE was evaluated on a benchmark dataset, which showed a better overall performance compared to other state-of-the-art methodologies, however, considerable care must be taken due to the reduced size of the dataset. DRIVE outlines the outstanding potential that multiple levels of a feature-based learning model will play in the future of oncology-based precision medicine.
format Online
Article
Text
id pubmed-8199862
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-81998622021-06-14 Identifying Cancer Drivers Using DRIVE: A Feature-Based Machine Learning Model for a Pan-Cancer Assessment of Somatic Missense Mutations Dragomir, Ionut Akbar, Adnan Cassidy, John W. Patel, Nirmesh Clifford, Harry W. Contino, Gianmarco Cancers (Basel) Article SIMPLE SUMMARY: Genes dictate the grounds of life by comprising molecular bases which encode proteins. A mutation represents a gene modification that may influence the protein function. Cancer occurs when the mutation triggers uncontrolled cellular growth. Judging by the cancer expansion, mutations labelled as drivers confer a growth advantage, while passengers do not contribute to this augmentation. The aim of this study is methodological, which assesses the usefulness of a classification method for distinguishing between driver and passenger mutations. Based on 51 molecular characteristics of mutations and genes, including 3 novel features, multiple machine learning algorithms were used to determine whether these characteristics biologically represent the driver mutations and how they impact the classification procedure. To test the ability of the present methodology, the same steps were applied to an independent dataset. The results showed that both gene and mutation level characteristics are representative of the driver mutations, and the proposed approach achieved more than 80% accuracy in finding the true type of mutation. The evidence suggests that machine learning methods can be used to gain knowledge from mutational data seeking to deliver more targeted cancer treatment. ABSTRACT: Sporadic cancer develops from the accrual of somatic mutations. Out of all small-scale somatic aberrations in coding regions, 95% are base substitutions, with 90% being missense mutations. While multiple studies focused on the importance of this mutation type, a machine learning method based on the number of protein–protein interactions (PPIs) has not been fully explored. This study aims to develop an improved computational method for driver identification, validation and evaluation (DRIVE), which is compared to other methods for assessing its performance. DRIVE aims at distinguishing between driver and passenger mutations using a feature-based learning approach comprising two levels of biological classification for a pan-cancer assessment of somatic mutations. Gene-level features include the maximum number of protein–protein interactions, the biological process and the type of post-translational modifications (PTMs) while mutation-level features are based on pathogenicity scores. Multiple supervised classification algorithms were trained on Genomics Evidence Neoplasia Information Exchange (GENIE) project data and then tested on an independent dataset from The Cancer Genome Atlas (TCGA) study. Finally, the most powerful classifier using DRIVE was evaluated on a benchmark dataset, which showed a better overall performance compared to other state-of-the-art methodologies, however, considerable care must be taken due to the reduced size of the dataset. DRIVE outlines the outstanding potential that multiple levels of a feature-based learning model will play in the future of oncology-based precision medicine. MDPI 2021-06-03 /pmc/articles/PMC8199862/ /pubmed/34205004 http://dx.doi.org/10.3390/cancers13112779 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Dragomir, Ionut
Akbar, Adnan
Cassidy, John W.
Patel, Nirmesh
Clifford, Harry W.
Contino, Gianmarco
Identifying Cancer Drivers Using DRIVE: A Feature-Based Machine Learning Model for a Pan-Cancer Assessment of Somatic Missense Mutations
title Identifying Cancer Drivers Using DRIVE: A Feature-Based Machine Learning Model for a Pan-Cancer Assessment of Somatic Missense Mutations
title_full Identifying Cancer Drivers Using DRIVE: A Feature-Based Machine Learning Model for a Pan-Cancer Assessment of Somatic Missense Mutations
title_fullStr Identifying Cancer Drivers Using DRIVE: A Feature-Based Machine Learning Model for a Pan-Cancer Assessment of Somatic Missense Mutations
title_full_unstemmed Identifying Cancer Drivers Using DRIVE: A Feature-Based Machine Learning Model for a Pan-Cancer Assessment of Somatic Missense Mutations
title_short Identifying Cancer Drivers Using DRIVE: A Feature-Based Machine Learning Model for a Pan-Cancer Assessment of Somatic Missense Mutations
title_sort identifying cancer drivers using drive: a feature-based machine learning model for a pan-cancer assessment of somatic missense mutations
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8199862/
https://www.ncbi.nlm.nih.gov/pubmed/34205004
http://dx.doi.org/10.3390/cancers13112779
work_keys_str_mv AT dragomirionut identifyingcancerdriversusingdriveafeaturebasedmachinelearningmodelforapancancerassessmentofsomaticmissensemutations
AT akbaradnan identifyingcancerdriversusingdriveafeaturebasedmachinelearningmodelforapancancerassessmentofsomaticmissensemutations
AT cassidyjohnw identifyingcancerdriversusingdriveafeaturebasedmachinelearningmodelforapancancerassessmentofsomaticmissensemutations
AT patelnirmesh identifyingcancerdriversusingdriveafeaturebasedmachinelearningmodelforapancancerassessmentofsomaticmissensemutations
AT cliffordharryw identifyingcancerdriversusingdriveafeaturebasedmachinelearningmodelforapancancerassessmentofsomaticmissensemutations
AT continogianmarco identifyingcancerdriversusingdriveafeaturebasedmachinelearningmodelforapancancerassessmentofsomaticmissensemutations