Cargando…

Random Forest Modelling of High-Dimensional Mixed-Type Data for Breast Cancer Classification

SIMPLE SUMMARY: Breast cancer is a complex disease, and the identification of its underlying molecular mechanisms is critical for the development of treatment strategies. The purpose of this study was to implement a computational framework that is capable of combining many types of data into a meani...

Descripción completa

Detalles Bibliográficos
Autores principales:	Quist, Jelmar, Taylor, Lawson, Staaf, Johan, Grigoriadis, Anita
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7956671/ https://www.ncbi.nlm.nih.gov/pubmed/33673506 http://dx.doi.org/10.3390/cancers13050991

_version_	1783664489856499712
author	Quist, Jelmar Taylor, Lawson Staaf, Johan Grigoriadis, Anita
author_facet	Quist, Jelmar Taylor, Lawson Staaf, Johan Grigoriadis, Anita
author_sort	Quist, Jelmar
collection	PubMed
description	SIMPLE SUMMARY: Breast cancer is a complex disease, and the identification of its underlying molecular mechanisms is critical for the development of treatment strategies. The purpose of this study was to implement a computational framework that is capable of combining many types of data into a meaningful classification. While our approach can be used on many types of data and in many diseases, we applied this framework to breast cancer data and identified six triple-negative breast cancer subtypes with distinct underlying molecular mechanisms. The relevance of our approach is highlighted by the clinical outcome analysis in which a group of patients responding poorly to standard-of-care adjuvant chemotherapy was identified. This study serves as a starting point for our computational framework, which can be extended to different types of data from different diseases. ABSTRACT: Advances in high-throughput technologies encourage the generation of large amounts of multiomics data to investigate complex diseases, including breast cancer. Given that the aetiologies of such diseases extend beyond a single biological entity, and that essential biological information can be carried by all data regardless of data type, integrative analyses are needed to identify clinically relevant patterns. To facilitate such analyses, we present a permutation-based framework for random forest methods which simultaneously allows the unbiased integration of mixed-type data and assessment of relative feature importance. Through simulation studies and machine learning datasets, the performance of the approach was evaluated. The results showed minimal multicollinearity and limited overfitting. To further assess the performance, the permutation-based framework was applied to high-dimensional mixed-type data from two independent breast cancer cohorts. Reproducibility and robustness of our approach was demonstrated by the concordance in relative feature importance between the cohorts, along with consistencies in clustering profiles. One of the identified clusters was shown to be prognostic for clinical outcome after standard-of-care adjuvant chemotherapy and outperformed current intrinsic molecular breast cancer classifications.
format	Online Article Text
id	pubmed-7956671
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-79566712021-03-16 Random Forest Modelling of High-Dimensional Mixed-Type Data for Breast Cancer Classification Quist, Jelmar Taylor, Lawson Staaf, Johan Grigoriadis, Anita Cancers (Basel) Article SIMPLE SUMMARY: Breast cancer is a complex disease, and the identification of its underlying molecular mechanisms is critical for the development of treatment strategies. The purpose of this study was to implement a computational framework that is capable of combining many types of data into a meaningful classification. While our approach can be used on many types of data and in many diseases, we applied this framework to breast cancer data and identified six triple-negative breast cancer subtypes with distinct underlying molecular mechanisms. The relevance of our approach is highlighted by the clinical outcome analysis in which a group of patients responding poorly to standard-of-care adjuvant chemotherapy was identified. This study serves as a starting point for our computational framework, which can be extended to different types of data from different diseases. ABSTRACT: Advances in high-throughput technologies encourage the generation of large amounts of multiomics data to investigate complex diseases, including breast cancer. Given that the aetiologies of such diseases extend beyond a single biological entity, and that essential biological information can be carried by all data regardless of data type, integrative analyses are needed to identify clinically relevant patterns. To facilitate such analyses, we present a permutation-based framework for random forest methods which simultaneously allows the unbiased integration of mixed-type data and assessment of relative feature importance. Through simulation studies and machine learning datasets, the performance of the approach was evaluated. The results showed minimal multicollinearity and limited overfitting. To further assess the performance, the permutation-based framework was applied to high-dimensional mixed-type data from two independent breast cancer cohorts. Reproducibility and robustness of our approach was demonstrated by the concordance in relative feature importance between the cohorts, along with consistencies in clustering profiles. One of the identified clusters was shown to be prognostic for clinical outcome after standard-of-care adjuvant chemotherapy and outperformed current intrinsic molecular breast cancer classifications. MDPI 2021-02-27 /pmc/articles/PMC7956671/ /pubmed/33673506 http://dx.doi.org/10.3390/cancers13050991 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Quist, Jelmar Taylor, Lawson Staaf, Johan Grigoriadis, Anita Random Forest Modelling of High-Dimensional Mixed-Type Data for Breast Cancer Classification
title	Random Forest Modelling of High-Dimensional Mixed-Type Data for Breast Cancer Classification
title_full	Random Forest Modelling of High-Dimensional Mixed-Type Data for Breast Cancer Classification
title_fullStr	Random Forest Modelling of High-Dimensional Mixed-Type Data for Breast Cancer Classification
title_full_unstemmed	Random Forest Modelling of High-Dimensional Mixed-Type Data for Breast Cancer Classification
title_short	Random Forest Modelling of High-Dimensional Mixed-Type Data for Breast Cancer Classification
title_sort	random forest modelling of high-dimensional mixed-type data for breast cancer classification
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7956671/ https://www.ncbi.nlm.nih.gov/pubmed/33673506 http://dx.doi.org/10.3390/cancers13050991
work_keys_str_mv	AT quistjelmar randomforestmodellingofhighdimensionalmixedtypedataforbreastcancerclassification AT taylorlawson randomforestmodellingofhighdimensionalmixedtypedataforbreastcancerclassification AT staafjohan randomforestmodellingofhighdimensionalmixedtypedataforbreastcancerclassification AT grigoriadisanita randomforestmodellingofhighdimensionalmixedtypedataforbreastcancerclassification

Random Forest Modelling of High-Dimensional Mixed-Type Data for Breast Cancer Classification

Ejemplares similares