Cargando…
Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma
Breast adenocarcinoma is the most common of all cancers that occur in women. According to the United States of America survey, more than 282,000 breast cancer patients are registered each year; most of them are women. Detection of cancer at its early stage saves many lives. Each cell contains the ge...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9273792/ https://www.ncbi.nlm.nih.gov/pubmed/35817838 http://dx.doi.org/10.1038/s41598-022-15533-8 |
_version_ | 1784745155636494336 |
---|---|
author | Shah, Asghar Ali Malik, Hafiz Abid Mahmood Mohammad, AbdulHafeez Khan, Yaser Daanial Alourani, Abdullah |
author_facet | Shah, Asghar Ali Malik, Hafiz Abid Mahmood Mohammad, AbdulHafeez Khan, Yaser Daanial Alourani, Abdullah |
author_sort | Shah, Asghar Ali |
collection | PubMed |
description | Breast adenocarcinoma is the most common of all cancers that occur in women. According to the United States of America survey, more than 282,000 breast cancer patients are registered each year; most of them are women. Detection of cancer at its early stage saves many lives. Each cell contains the genetic code in the form of gene sequences. Changes in the gene sequences may lead to cancer. Replication and/or recombination in the gene base sometimes lead to a permanent change in the nucleotide sequence of the genome, called a mutation. Cancer driver mutations can lead to cancer. The proposed study develops a framework for the early detection of breast adenocarcinoma using machine learning techniques. Every gene has a specific sequence of nucleotides. A total of 99 genes are identified in various studies whose mutations can lead to breast adenocarcinoma. This study uses the dataset taken from 4127 human samples, including men and women from more than 12 cohorts. A total of 6170 mutations in gene sequences are used in this study. Decision Tree, Random Forest, and Gaussian Naïve Bayes are applied to these gene sequences using three evaluation methods: independent set testing, self-consistency testing, and tenfold cross-validation testing. Evaluation metrics such as accuracy, specificity, sensitivity, and Mathew’s correlation coefficient are calculated. The decision tree algorithm obtains the best accuracy of 99% for each evaluation method. |
format | Online Article Text |
id | pubmed-9273792 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-92737922022-07-13 Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma Shah, Asghar Ali Malik, Hafiz Abid Mahmood Mohammad, AbdulHafeez Khan, Yaser Daanial Alourani, Abdullah Sci Rep Article Breast adenocarcinoma is the most common of all cancers that occur in women. According to the United States of America survey, more than 282,000 breast cancer patients are registered each year; most of them are women. Detection of cancer at its early stage saves many lives. Each cell contains the genetic code in the form of gene sequences. Changes in the gene sequences may lead to cancer. Replication and/or recombination in the gene base sometimes lead to a permanent change in the nucleotide sequence of the genome, called a mutation. Cancer driver mutations can lead to cancer. The proposed study develops a framework for the early detection of breast adenocarcinoma using machine learning techniques. Every gene has a specific sequence of nucleotides. A total of 99 genes are identified in various studies whose mutations can lead to breast adenocarcinoma. This study uses the dataset taken from 4127 human samples, including men and women from more than 12 cohorts. A total of 6170 mutations in gene sequences are used in this study. Decision Tree, Random Forest, and Gaussian Naïve Bayes are applied to these gene sequences using three evaluation methods: independent set testing, self-consistency testing, and tenfold cross-validation testing. Evaluation metrics such as accuracy, specificity, sensitivity, and Mathew’s correlation coefficient are calculated. The decision tree algorithm obtains the best accuracy of 99% for each evaluation method. Nature Publishing Group UK 2022-07-11 /pmc/articles/PMC9273792/ /pubmed/35817838 http://dx.doi.org/10.1038/s41598-022-15533-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Shah, Asghar Ali Malik, Hafiz Abid Mahmood Mohammad, AbdulHafeez Khan, Yaser Daanial Alourani, Abdullah Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma |
title | Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma |
title_full | Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma |
title_fullStr | Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma |
title_full_unstemmed | Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma |
title_short | Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma |
title_sort | machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9273792/ https://www.ncbi.nlm.nih.gov/pubmed/35817838 http://dx.doi.org/10.1038/s41598-022-15533-8 |
work_keys_str_mv | AT shahasgharali machinelearningtechniquesforidentificationofcarcinogenicmutationswhichcausebreastadenocarcinoma AT malikhafizabidmahmood machinelearningtechniquesforidentificationofcarcinogenicmutationswhichcausebreastadenocarcinoma AT mohammadabdulhafeez machinelearningtechniquesforidentificationofcarcinogenicmutationswhichcausebreastadenocarcinoma AT khanyaserdaanial machinelearningtechniquesforidentificationofcarcinogenicmutationswhichcausebreastadenocarcinoma AT alouraniabdullah machinelearningtechniquesforidentificationofcarcinogenicmutationswhichcausebreastadenocarcinoma |