Cargando…
BrcaDx: precise identification of breast cancer from expression data using a minimal set of features
Background: Breast cancer is the foremost cancer in worldwide incidence, surpassing lung cancer notwithstanding the gender bias. One in four cancer cases among women are attributable to cancers of the breast, which are also the leading cause of death in women. Reliable options for early detection of...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10242386/ https://www.ncbi.nlm.nih.gov/pubmed/37287543 http://dx.doi.org/10.3389/fbinf.2023.1103493 |
_version_ | 1785054205412638720 |
---|---|
author | Muthamilselvan, Sangeetha Palaniappan, Ashok |
author_facet | Muthamilselvan, Sangeetha Palaniappan, Ashok |
author_sort | Muthamilselvan, Sangeetha |
collection | PubMed |
description | Background: Breast cancer is the foremost cancer in worldwide incidence, surpassing lung cancer notwithstanding the gender bias. One in four cancer cases among women are attributable to cancers of the breast, which are also the leading cause of death in women. Reliable options for early detection of breast cancer are needed. Methods: Using public-domain datasets, we screened transcriptomic profiles of breast cancer samples, and identified progression-significant linear and ordinal model genes using stage-informed models. We then applied a sequence of machine learning techniques, namely, feature selection, principal components analysis, and k-means clustering, to train a learner to discriminate “cancer” from “normal” based on expression levels of identified biomarkers. Results: Our computational pipeline yielded an optimal set of nine biomarker features for training the learner, namely, NEK2, PKMYT1, MMP11, CPA1, COL10A1, HSD17B13, CA4, MYOC, and LYVE1. Validation of the learned model on an independent test dataset yielded a performance of 99.5% accuracy. Blind validation on an out-of-domain external dataset yielded a balanced accuracy of 95.5%, demonstrating that the model has effectively reduced the dimensionality of the problem, and learnt the solution. The model was rebuilt using the full dataset, and then deployed as a web app for non-profit purposes at: https://apalania.shinyapps.io/brcadx/. To our knowledge, this is the best-performing freely available tool for the high-confidence diagnosis of breast cancer, and represents a promising aid to medical diagnosis. |
format | Online Article Text |
id | pubmed-10242386 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-102423862023-06-07 BrcaDx: precise identification of breast cancer from expression data using a minimal set of features Muthamilselvan, Sangeetha Palaniappan, Ashok Front Bioinform Bioinformatics Background: Breast cancer is the foremost cancer in worldwide incidence, surpassing lung cancer notwithstanding the gender bias. One in four cancer cases among women are attributable to cancers of the breast, which are also the leading cause of death in women. Reliable options for early detection of breast cancer are needed. Methods: Using public-domain datasets, we screened transcriptomic profiles of breast cancer samples, and identified progression-significant linear and ordinal model genes using stage-informed models. We then applied a sequence of machine learning techniques, namely, feature selection, principal components analysis, and k-means clustering, to train a learner to discriminate “cancer” from “normal” based on expression levels of identified biomarkers. Results: Our computational pipeline yielded an optimal set of nine biomarker features for training the learner, namely, NEK2, PKMYT1, MMP11, CPA1, COL10A1, HSD17B13, CA4, MYOC, and LYVE1. Validation of the learned model on an independent test dataset yielded a performance of 99.5% accuracy. Blind validation on an out-of-domain external dataset yielded a balanced accuracy of 95.5%, demonstrating that the model has effectively reduced the dimensionality of the problem, and learnt the solution. The model was rebuilt using the full dataset, and then deployed as a web app for non-profit purposes at: https://apalania.shinyapps.io/brcadx/. To our knowledge, this is the best-performing freely available tool for the high-confidence diagnosis of breast cancer, and represents a promising aid to medical diagnosis. Frontiers Media S.A. 2023-05-23 /pmc/articles/PMC10242386/ /pubmed/37287543 http://dx.doi.org/10.3389/fbinf.2023.1103493 Text en Copyright © 2023 Muthamilselvan and Palaniappan. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Bioinformatics Muthamilselvan, Sangeetha Palaniappan, Ashok BrcaDx: precise identification of breast cancer from expression data using a minimal set of features |
title | BrcaDx: precise identification of breast cancer from expression data using a minimal set of features |
title_full | BrcaDx: precise identification of breast cancer from expression data using a minimal set of features |
title_fullStr | BrcaDx: precise identification of breast cancer from expression data using a minimal set of features |
title_full_unstemmed | BrcaDx: precise identification of breast cancer from expression data using a minimal set of features |
title_short | BrcaDx: precise identification of breast cancer from expression data using a minimal set of features |
title_sort | brcadx: precise identification of breast cancer from expression data using a minimal set of features |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10242386/ https://www.ncbi.nlm.nih.gov/pubmed/37287543 http://dx.doi.org/10.3389/fbinf.2023.1103493 |
work_keys_str_mv | AT muthamilselvansangeetha brcadxpreciseidentificationofbreastcancerfromexpressiondatausingaminimalsetoffeatures AT palaniappanashok brcadxpreciseidentificationofbreastcancerfromexpressiondatausingaminimalsetoffeatures |