Cargando…

BrcaDx: precise identification of breast cancer from expression data using a minimal set of features

Background: Breast cancer is the foremost cancer in worldwide incidence, surpassing lung cancer notwithstanding the gender bias. One in four cancer cases among women are attributable to cancers of the breast, which are also the leading cause of death in women. Reliable options for early detection of...

Descripción completa

Detalles Bibliográficos
Autores principales: Muthamilselvan, Sangeetha, Palaniappan, Ashok
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10242386/
https://www.ncbi.nlm.nih.gov/pubmed/37287543
http://dx.doi.org/10.3389/fbinf.2023.1103493
_version_ 1785054205412638720
author Muthamilselvan, Sangeetha
Palaniappan, Ashok
author_facet Muthamilselvan, Sangeetha
Palaniappan, Ashok
author_sort Muthamilselvan, Sangeetha
collection PubMed
description Background: Breast cancer is the foremost cancer in worldwide incidence, surpassing lung cancer notwithstanding the gender bias. One in four cancer cases among women are attributable to cancers of the breast, which are also the leading cause of death in women. Reliable options for early detection of breast cancer are needed. Methods: Using public-domain datasets, we screened transcriptomic profiles of breast cancer samples, and identified progression-significant linear and ordinal model genes using stage-informed models. We then applied a sequence of machine learning techniques, namely, feature selection, principal components analysis, and k-means clustering, to train a learner to discriminate “cancer” from “normal” based on expression levels of identified biomarkers. Results: Our computational pipeline yielded an optimal set of nine biomarker features for training the learner, namely, NEK2, PKMYT1, MMP11, CPA1, COL10A1, HSD17B13, CA4, MYOC, and LYVE1. Validation of the learned model on an independent test dataset yielded a performance of 99.5% accuracy. Blind validation on an out-of-domain external dataset yielded a balanced accuracy of 95.5%, demonstrating that the model has effectively reduced the dimensionality of the problem, and learnt the solution. The model was rebuilt using the full dataset, and then deployed as a web app for non-profit purposes at: https://apalania.shinyapps.io/brcadx/. To our knowledge, this is the best-performing freely available tool for the high-confidence diagnosis of breast cancer, and represents a promising aid to medical diagnosis.
format Online
Article
Text
id pubmed-10242386
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-102423862023-06-07 BrcaDx: precise identification of breast cancer from expression data using a minimal set of features Muthamilselvan, Sangeetha Palaniappan, Ashok Front Bioinform Bioinformatics Background: Breast cancer is the foremost cancer in worldwide incidence, surpassing lung cancer notwithstanding the gender bias. One in four cancer cases among women are attributable to cancers of the breast, which are also the leading cause of death in women. Reliable options for early detection of breast cancer are needed. Methods: Using public-domain datasets, we screened transcriptomic profiles of breast cancer samples, and identified progression-significant linear and ordinal model genes using stage-informed models. We then applied a sequence of machine learning techniques, namely, feature selection, principal components analysis, and k-means clustering, to train a learner to discriminate “cancer” from “normal” based on expression levels of identified biomarkers. Results: Our computational pipeline yielded an optimal set of nine biomarker features for training the learner, namely, NEK2, PKMYT1, MMP11, CPA1, COL10A1, HSD17B13, CA4, MYOC, and LYVE1. Validation of the learned model on an independent test dataset yielded a performance of 99.5% accuracy. Blind validation on an out-of-domain external dataset yielded a balanced accuracy of 95.5%, demonstrating that the model has effectively reduced the dimensionality of the problem, and learnt the solution. The model was rebuilt using the full dataset, and then deployed as a web app for non-profit purposes at: https://apalania.shinyapps.io/brcadx/. To our knowledge, this is the best-performing freely available tool for the high-confidence diagnosis of breast cancer, and represents a promising aid to medical diagnosis. Frontiers Media S.A. 2023-05-23 /pmc/articles/PMC10242386/ /pubmed/37287543 http://dx.doi.org/10.3389/fbinf.2023.1103493 Text en Copyright © 2023 Muthamilselvan and Palaniappan. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioinformatics
Muthamilselvan, Sangeetha
Palaniappan, Ashok
BrcaDx: precise identification of breast cancer from expression data using a minimal set of features
title BrcaDx: precise identification of breast cancer from expression data using a minimal set of features
title_full BrcaDx: precise identification of breast cancer from expression data using a minimal set of features
title_fullStr BrcaDx: precise identification of breast cancer from expression data using a minimal set of features
title_full_unstemmed BrcaDx: precise identification of breast cancer from expression data using a minimal set of features
title_short BrcaDx: precise identification of breast cancer from expression data using a minimal set of features
title_sort brcadx: precise identification of breast cancer from expression data using a minimal set of features
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10242386/
https://www.ncbi.nlm.nih.gov/pubmed/37287543
http://dx.doi.org/10.3389/fbinf.2023.1103493
work_keys_str_mv AT muthamilselvansangeetha brcadxpreciseidentificationofbreastcancerfromexpressiondatausingaminimalsetoffeatures
AT palaniappanashok brcadxpreciseidentificationofbreastcancerfromexpressiondatausingaminimalsetoffeatures