Cargando…

ROSIE: RObust Sparse ensemble for outlIEr detection and gene selection in cancer omics data

The extraction of novel information from omics data is a challenging task, in particular, since the number of features (e.g. genes) often far exceeds the number of samples. In such a setting, conventional parameter estimation leads to ill-posed optimization problems, and regularization may be requir...

Descripción completa

Detalles Bibliográficos
Autores principales: Jensch, Antje, Lopes, Marta B., Vinga, Susana, Radde, Nicole
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9014683/
https://www.ncbi.nlm.nih.gov/pubmed/35072570
http://dx.doi.org/10.1177/09622802211072456
_version_ 1784688235414290432
author Jensch, Antje
Lopes, Marta B.
Vinga, Susana
Radde, Nicole
author_facet Jensch, Antje
Lopes, Marta B.
Vinga, Susana
Radde, Nicole
author_sort Jensch, Antje
collection PubMed
description The extraction of novel information from omics data is a challenging task, in particular, since the number of features (e.g. genes) often far exceeds the number of samples. In such a setting, conventional parameter estimation leads to ill-posed optimization problems, and regularization may be required. In addition, outliers can largely impact classification accuracy. Here we introduce ROSIE, an ensemble classification approach, which combines three sparse and robust classification methods for outlier detection and feature selection and further performs a bootstrap-based validity check. Outliers of ROSIE are determined by the rank product test using outlier rankings of all three methods, and important features are selected as features commonly selected by all methods. We apply ROSIE to RNA-Seq data from The Cancer Genome Atlas (TCGA) to classify observations into Triple-Negative Breast Cancer (TNBC) and non-TNBC tissue samples. The pre-processed dataset consists of [Formula: see text] genes and more than [Formula: see text] samples. We demonstrate that ROSIE selects important features and outliers in a robust way. Identified outliers are concordant with the distribution of the commonly selected genes by the three methods, and results are in line with other independent studies. Furthermore, we discuss the association of some of the selected genes with the TNBC subtype in other investigations. In summary, ROSIE constitutes a robust and sparse procedure to identify outliers and important genes through binary classification. Our approach is ad hoc applicable to other datasets, fulfilling the overall goal of simultaneously identifying outliers and candidate disease biomarkers to the targeted in therapy research and personalized medicine frameworks.
format Online
Article
Text
id pubmed-9014683
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-90146832022-04-19 ROSIE: RObust Sparse ensemble for outlIEr detection and gene selection in cancer omics data Jensch, Antje Lopes, Marta B. Vinga, Susana Radde, Nicole Stat Methods Med Res Original Research Articles The extraction of novel information from omics data is a challenging task, in particular, since the number of features (e.g. genes) often far exceeds the number of samples. In such a setting, conventional parameter estimation leads to ill-posed optimization problems, and regularization may be required. In addition, outliers can largely impact classification accuracy. Here we introduce ROSIE, an ensemble classification approach, which combines three sparse and robust classification methods for outlier detection and feature selection and further performs a bootstrap-based validity check. Outliers of ROSIE are determined by the rank product test using outlier rankings of all three methods, and important features are selected as features commonly selected by all methods. We apply ROSIE to RNA-Seq data from The Cancer Genome Atlas (TCGA) to classify observations into Triple-Negative Breast Cancer (TNBC) and non-TNBC tissue samples. The pre-processed dataset consists of [Formula: see text] genes and more than [Formula: see text] samples. We demonstrate that ROSIE selects important features and outliers in a robust way. Identified outliers are concordant with the distribution of the commonly selected genes by the three methods, and results are in line with other independent studies. Furthermore, we discuss the association of some of the selected genes with the TNBC subtype in other investigations. In summary, ROSIE constitutes a robust and sparse procedure to identify outliers and important genes through binary classification. Our approach is ad hoc applicable to other datasets, fulfilling the overall goal of simultaneously identifying outliers and candidate disease biomarkers to the targeted in therapy research and personalized medicine frameworks. SAGE Publications 2022-01-24 2022-05 /pmc/articles/PMC9014683/ /pubmed/35072570 http://dx.doi.org/10.1177/09622802211072456 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Original Research Articles
Jensch, Antje
Lopes, Marta B.
Vinga, Susana
Radde, Nicole
ROSIE: RObust Sparse ensemble for outlIEr detection and gene selection in cancer omics data
title ROSIE: RObust Sparse ensemble for outlIEr detection and gene selection in cancer omics data
title_full ROSIE: RObust Sparse ensemble for outlIEr detection and gene selection in cancer omics data
title_fullStr ROSIE: RObust Sparse ensemble for outlIEr detection and gene selection in cancer omics data
title_full_unstemmed ROSIE: RObust Sparse ensemble for outlIEr detection and gene selection in cancer omics data
title_short ROSIE: RObust Sparse ensemble for outlIEr detection and gene selection in cancer omics data
title_sort rosie: robust sparse ensemble for outlier detection and gene selection in cancer omics data
topic Original Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9014683/
https://www.ncbi.nlm.nih.gov/pubmed/35072570
http://dx.doi.org/10.1177/09622802211072456
work_keys_str_mv AT jenschantje rosierobustsparseensembleforoutlierdetectionandgeneselectionincanceromicsdata
AT lopesmartab rosierobustsparseensembleforoutlierdetectionandgeneselectionincanceromicsdata
AT vingasusana rosierobustsparseensembleforoutlierdetectionandgeneselectionincanceromicsdata
AT raddenicole rosierobustsparseensembleforoutlierdetectionandgeneselectionincanceromicsdata