Cargando…

Gene Mutation Classification through Text Evidence Facilitating Cancer Tumour Detection

A cancer tumour consists of thousands of genetic mutations. Even after advancement in technology, the task of distinguishing genetic mutations, which act as driver for the growth of tumour with passengers (Neutral Genetic Mutations), is still being done manually. This is a time-consuming process whe...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gupta, Meenu, Wu, Hao, Arora, Simrann, Gupta, Akash, Chaudhary, Gopal, Hua, Qiaozhi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8337154/ https://www.ncbi.nlm.nih.gov/pubmed/34367540 http://dx.doi.org/10.1155/2021/8689873

_version_	1783733453630472192
author	Gupta, Meenu Wu, Hao Arora, Simrann Gupta, Akash Chaudhary, Gopal Hua, Qiaozhi
author_facet	Gupta, Meenu Wu, Hao Arora, Simrann Gupta, Akash Chaudhary, Gopal Hua, Qiaozhi
author_sort	Gupta, Meenu
collection	PubMed
description	A cancer tumour consists of thousands of genetic mutations. Even after advancement in technology, the task of distinguishing genetic mutations, which act as driver for the growth of tumour with passengers (Neutral Genetic Mutations), is still being done manually. This is a time-consuming process where pathologists interpret every genetic mutation from the clinical evidence manually. These clinical shreds of evidence belong to a total of nine classes, but the criterion of classification is still unknown. The main aim of this research is to propose a multiclass classifier to classify the genetic mutations based on clinical evidence (i.e., the text description of these genetic mutations) using Natural Language Processing (NLP) techniques. The dataset for this research is taken from Kaggle and is provided by the Memorial Sloan Kettering Cancer Center (MSKCC). The world-class researchers and oncologists contribute the dataset. Three text transformation models, namely, CountVectorizer, TfidfVectorizer, and Word2Vec, are utilized for the conversion of text to a matrix of token counts. Three machine learning classification models, namely, Logistic Regression (LR), Random Forest (RF), and XGBoost (XGB), along with the Recurrent Neural Network (RNN) model of deep learning, are applied to the sparse matrix (keywords count representation) of text descriptions. The accuracy score of all the proposed classifiers is evaluated by using the confusion matrix. Finally, the empirical results show that the RNN model of deep learning has performed better than other proposed classifiers with the highest accuracy of 70%.
format	Online Article Text
id	pubmed-8337154
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-83371542021-08-05 Gene Mutation Classification through Text Evidence Facilitating Cancer Tumour Detection Gupta, Meenu Wu, Hao Arora, Simrann Gupta, Akash Chaudhary, Gopal Hua, Qiaozhi J Healthc Eng Research Article A cancer tumour consists of thousands of genetic mutations. Even after advancement in technology, the task of distinguishing genetic mutations, which act as driver for the growth of tumour with passengers (Neutral Genetic Mutations), is still being done manually. This is a time-consuming process where pathologists interpret every genetic mutation from the clinical evidence manually. These clinical shreds of evidence belong to a total of nine classes, but the criterion of classification is still unknown. The main aim of this research is to propose a multiclass classifier to classify the genetic mutations based on clinical evidence (i.e., the text description of these genetic mutations) using Natural Language Processing (NLP) techniques. The dataset for this research is taken from Kaggle and is provided by the Memorial Sloan Kettering Cancer Center (MSKCC). The world-class researchers and oncologists contribute the dataset. Three text transformation models, namely, CountVectorizer, TfidfVectorizer, and Word2Vec, are utilized for the conversion of text to a matrix of token counts. Three machine learning classification models, namely, Logistic Regression (LR), Random Forest (RF), and XGBoost (XGB), along with the Recurrent Neural Network (RNN) model of deep learning, are applied to the sparse matrix (keywords count representation) of text descriptions. The accuracy score of all the proposed classifiers is evaluated by using the confusion matrix. Finally, the empirical results show that the RNN model of deep learning has performed better than other proposed classifiers with the highest accuracy of 70%. Hindawi 2021-07-27 /pmc/articles/PMC8337154/ /pubmed/34367540 http://dx.doi.org/10.1155/2021/8689873 Text en Copyright © 2021 Meenu Gupta et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Gupta, Meenu Wu, Hao Arora, Simrann Gupta, Akash Chaudhary, Gopal Hua, Qiaozhi Gene Mutation Classification through Text Evidence Facilitating Cancer Tumour Detection
title	Gene Mutation Classification through Text Evidence Facilitating Cancer Tumour Detection
title_full	Gene Mutation Classification through Text Evidence Facilitating Cancer Tumour Detection
title_fullStr	Gene Mutation Classification through Text Evidence Facilitating Cancer Tumour Detection
title_full_unstemmed	Gene Mutation Classification through Text Evidence Facilitating Cancer Tumour Detection
title_short	Gene Mutation Classification through Text Evidence Facilitating Cancer Tumour Detection
title_sort	gene mutation classification through text evidence facilitating cancer tumour detection
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8337154/ https://www.ncbi.nlm.nih.gov/pubmed/34367540 http://dx.doi.org/10.1155/2021/8689873
work_keys_str_mv	AT guptameenu genemutationclassificationthroughtextevidencefacilitatingcancertumourdetection AT wuhao genemutationclassificationthroughtextevidencefacilitatingcancertumourdetection AT arorasimrann genemutationclassificationthroughtextevidencefacilitatingcancertumourdetection AT guptaakash genemutationclassificationthroughtextevidencefacilitatingcancertumourdetection AT chaudharygopal genemutationclassificationthroughtextevidencefacilitatingcancertumourdetection AT huaqiaozhi genemutationclassificationthroughtextevidencefacilitatingcancertumourdetection

Gene Mutation Classification through Text Evidence Facilitating Cancer Tumour Detection

Ejemplares similares