Cargando…

Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma

Breast adenocarcinoma is the most common of all cancers that occur in women. According to the United States of America survey, more than 282,000 breast cancer patients are registered each year; most of them are women. Detection of cancer at its early stage saves many lives. Each cell contains the ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Shah, Asghar Ali, Malik, Hafiz Abid Mahmood, Mohammad, AbdulHafeez, Khan, Yaser Daanial, Alourani, Abdullah
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9273792/
https://www.ncbi.nlm.nih.gov/pubmed/35817838
http://dx.doi.org/10.1038/s41598-022-15533-8
_version_ 1784745155636494336
author Shah, Asghar Ali
Malik, Hafiz Abid Mahmood
Mohammad, AbdulHafeez
Khan, Yaser Daanial
Alourani, Abdullah
author_facet Shah, Asghar Ali
Malik, Hafiz Abid Mahmood
Mohammad, AbdulHafeez
Khan, Yaser Daanial
Alourani, Abdullah
author_sort Shah, Asghar Ali
collection PubMed
description Breast adenocarcinoma is the most common of all cancers that occur in women. According to the United States of America survey, more than 282,000 breast cancer patients are registered each year; most of them are women. Detection of cancer at its early stage saves many lives. Each cell contains the genetic code in the form of gene sequences. Changes in the gene sequences may lead to cancer. Replication and/or recombination in the gene base sometimes lead to a permanent change in the nucleotide sequence of the genome, called a mutation. Cancer driver mutations can lead to cancer. The proposed study develops a framework for the early detection of breast adenocarcinoma using machine learning techniques. Every gene has a specific sequence of nucleotides. A total of 99 genes are identified in various studies whose mutations can lead to breast adenocarcinoma. This study uses the dataset taken from 4127 human samples, including men and women from more than 12 cohorts. A total of 6170 mutations in gene sequences are used in this study. Decision Tree, Random Forest, and Gaussian Naïve Bayes are applied to these gene sequences using three evaluation methods: independent set testing, self-consistency testing, and tenfold cross-validation testing. Evaluation metrics such as accuracy, specificity, sensitivity, and Mathew’s correlation coefficient are calculated. The decision tree algorithm obtains the best accuracy of 99% for each evaluation method.
format Online
Article
Text
id pubmed-9273792
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-92737922022-07-13 Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma Shah, Asghar Ali Malik, Hafiz Abid Mahmood Mohammad, AbdulHafeez Khan, Yaser Daanial Alourani, Abdullah Sci Rep Article Breast adenocarcinoma is the most common of all cancers that occur in women. According to the United States of America survey, more than 282,000 breast cancer patients are registered each year; most of them are women. Detection of cancer at its early stage saves many lives. Each cell contains the genetic code in the form of gene sequences. Changes in the gene sequences may lead to cancer. Replication and/or recombination in the gene base sometimes lead to a permanent change in the nucleotide sequence of the genome, called a mutation. Cancer driver mutations can lead to cancer. The proposed study develops a framework for the early detection of breast adenocarcinoma using machine learning techniques. Every gene has a specific sequence of nucleotides. A total of 99 genes are identified in various studies whose mutations can lead to breast adenocarcinoma. This study uses the dataset taken from 4127 human samples, including men and women from more than 12 cohorts. A total of 6170 mutations in gene sequences are used in this study. Decision Tree, Random Forest, and Gaussian Naïve Bayes are applied to these gene sequences using three evaluation methods: independent set testing, self-consistency testing, and tenfold cross-validation testing. Evaluation metrics such as accuracy, specificity, sensitivity, and Mathew’s correlation coefficient are calculated. The decision tree algorithm obtains the best accuracy of 99% for each evaluation method. Nature Publishing Group UK 2022-07-11 /pmc/articles/PMC9273792/ /pubmed/35817838 http://dx.doi.org/10.1038/s41598-022-15533-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Shah, Asghar Ali
Malik, Hafiz Abid Mahmood
Mohammad, AbdulHafeez
Khan, Yaser Daanial
Alourani, Abdullah
Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma
title Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma
title_full Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma
title_fullStr Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma
title_full_unstemmed Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma
title_short Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma
title_sort machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9273792/
https://www.ncbi.nlm.nih.gov/pubmed/35817838
http://dx.doi.org/10.1038/s41598-022-15533-8
work_keys_str_mv AT shahasgharali machinelearningtechniquesforidentificationofcarcinogenicmutationswhichcausebreastadenocarcinoma
AT malikhafizabidmahmood machinelearningtechniquesforidentificationofcarcinogenicmutationswhichcausebreastadenocarcinoma
AT mohammadabdulhafeez machinelearningtechniquesforidentificationofcarcinogenicmutationswhichcausebreastadenocarcinoma
AT khanyaserdaanial machinelearningtechniquesforidentificationofcarcinogenicmutationswhichcausebreastadenocarcinoma
AT alouraniabdullah machinelearningtechniquesforidentificationofcarcinogenicmutationswhichcausebreastadenocarcinoma