Cargando…

Phenomapping of Patients with Primary Breast Cancer Using Machine Learning-Based Unsupervised Cluster Analysis

Primary breast cancer (PBC) is a heterogeneous disease at the clinical, histopathological, and molecular levels. The improved classification of PBC might be important to identify subgroups of the disease, relevant to patient management. Machine learning algorithms may allow a better understanding of...

Descripción completa

Detalles Bibliográficos
Autores principales: Ferro, Sara, Bottigliengo, Daniele, Gregori, Dario, Fabricio, Aline S. C., Gion, Massimo, Baldi, Ileana
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8067194/
https://www.ncbi.nlm.nih.gov/pubmed/33916398
http://dx.doi.org/10.3390/jpm11040272
_version_ 1783682745555222528
author Ferro, Sara
Bottigliengo, Daniele
Gregori, Dario
Fabricio, Aline S. C.
Gion, Massimo
Baldi, Ileana
author_facet Ferro, Sara
Bottigliengo, Daniele
Gregori, Dario
Fabricio, Aline S. C.
Gion, Massimo
Baldi, Ileana
author_sort Ferro, Sara
collection PubMed
description Primary breast cancer (PBC) is a heterogeneous disease at the clinical, histopathological, and molecular levels. The improved classification of PBC might be important to identify subgroups of the disease, relevant to patient management. Machine learning algorithms may allow a better understanding of the relationships within heterogeneous clinical syndromes. This work aims to show the potential of unsupervised learning techniques for improving classification in PBC. A dataset of 712 women with PBC is used as a motivating example. A set of variables containing biological prognostic parameters is considered to define groups of individuals. Four different clustering methods are used: K-means, self-organising maps, hierarchical agglomerative (HAC), and Gaussian mixture models clustering. HAC outperforms the other clustering methods. With an optimal partitioning parameter, the methods identify two clusters with different clinical profiles. Patients in the first cluster are younger and have lower values of the oestrogen receptor (ER) and progesterone receptor (PgR) than patients in the second cluster. Moreover, cathepsin D values are lower in the first cluster. The three most important variables identified by the HAC are: age, ER, and PgR. Unsupervised learning seems a suitable alternative for the analysis of PBC data, opening up new perspectives in the particularly active domain of dissecting clinical heterogeneity.
format Online
Article
Text
id pubmed-8067194
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-80671942021-04-25 Phenomapping of Patients with Primary Breast Cancer Using Machine Learning-Based Unsupervised Cluster Analysis Ferro, Sara Bottigliengo, Daniele Gregori, Dario Fabricio, Aline S. C. Gion, Massimo Baldi, Ileana J Pers Med Article Primary breast cancer (PBC) is a heterogeneous disease at the clinical, histopathological, and molecular levels. The improved classification of PBC might be important to identify subgroups of the disease, relevant to patient management. Machine learning algorithms may allow a better understanding of the relationships within heterogeneous clinical syndromes. This work aims to show the potential of unsupervised learning techniques for improving classification in PBC. A dataset of 712 women with PBC is used as a motivating example. A set of variables containing biological prognostic parameters is considered to define groups of individuals. Four different clustering methods are used: K-means, self-organising maps, hierarchical agglomerative (HAC), and Gaussian mixture models clustering. HAC outperforms the other clustering methods. With an optimal partitioning parameter, the methods identify two clusters with different clinical profiles. Patients in the first cluster are younger and have lower values of the oestrogen receptor (ER) and progesterone receptor (PgR) than patients in the second cluster. Moreover, cathepsin D values are lower in the first cluster. The three most important variables identified by the HAC are: age, ER, and PgR. Unsupervised learning seems a suitable alternative for the analysis of PBC data, opening up new perspectives in the particularly active domain of dissecting clinical heterogeneity. MDPI 2021-04-05 /pmc/articles/PMC8067194/ /pubmed/33916398 http://dx.doi.org/10.3390/jpm11040272 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ferro, Sara
Bottigliengo, Daniele
Gregori, Dario
Fabricio, Aline S. C.
Gion, Massimo
Baldi, Ileana
Phenomapping of Patients with Primary Breast Cancer Using Machine Learning-Based Unsupervised Cluster Analysis
title Phenomapping of Patients with Primary Breast Cancer Using Machine Learning-Based Unsupervised Cluster Analysis
title_full Phenomapping of Patients with Primary Breast Cancer Using Machine Learning-Based Unsupervised Cluster Analysis
title_fullStr Phenomapping of Patients with Primary Breast Cancer Using Machine Learning-Based Unsupervised Cluster Analysis
title_full_unstemmed Phenomapping of Patients with Primary Breast Cancer Using Machine Learning-Based Unsupervised Cluster Analysis
title_short Phenomapping of Patients with Primary Breast Cancer Using Machine Learning-Based Unsupervised Cluster Analysis
title_sort phenomapping of patients with primary breast cancer using machine learning-based unsupervised cluster analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8067194/
https://www.ncbi.nlm.nih.gov/pubmed/33916398
http://dx.doi.org/10.3390/jpm11040272
work_keys_str_mv AT ferrosara phenomappingofpatientswithprimarybreastcancerusingmachinelearningbasedunsupervisedclusteranalysis
AT bottigliengodaniele phenomappingofpatientswithprimarybreastcancerusingmachinelearningbasedunsupervisedclusteranalysis
AT gregoridario phenomappingofpatientswithprimarybreastcancerusingmachinelearningbasedunsupervisedclusteranalysis
AT fabricioalinesc phenomappingofpatientswithprimarybreastcancerusingmachinelearningbasedunsupervisedclusteranalysis
AT gionmassimo phenomappingofpatientswithprimarybreastcancerusingmachinelearningbasedunsupervisedclusteranalysis
AT baldiileana phenomappingofpatientswithprimarybreastcancerusingmachinelearningbasedunsupervisedclusteranalysis