Cargando…

Meta-Analyzing Multiple Omics Data With Robust Variable Selection

High-throughput omics data are becoming more and more popular in various areas of science. Given that many publicly available datasets address the same questions, researchers have applied meta-analysis to synthesize multiple datasets to achieve more reliable results for model estimation and predicti...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Zongliang, Zhou, Yan, Tong, Tiejun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8288516/
https://www.ncbi.nlm.nih.gov/pubmed/34290735
http://dx.doi.org/10.3389/fgene.2021.656826
_version_ 1783724100282220544
author Hu, Zongliang
Zhou, Yan
Tong, Tiejun
author_facet Hu, Zongliang
Zhou, Yan
Tong, Tiejun
author_sort Hu, Zongliang
collection PubMed
description High-throughput omics data are becoming more and more popular in various areas of science. Given that many publicly available datasets address the same questions, researchers have applied meta-analysis to synthesize multiple datasets to achieve more reliable results for model estimation and prediction. Due to the high dimensionality of omics data, it is also desirable to incorporate variable selection into meta-analysis. Existing meta-analyzing variable selection methods are often sensitive to the presence of outliers, and may lead to missed detections of relevant covariates, especially for lasso-type penalties. In this paper, we develop a robust variable selection algorithm for meta-analyzing high-dimensional datasets based on logistic regression. We first search an outlier-free subset from each dataset by borrowing information across the datasets with repeatedly use of the least trimmed squared estimates for the logistic model and together with a hierarchical bi-level variable selection technique. We then refine a reweighting step to further improve the efficiency after obtaining a reliable non-outlier subset. Simulation studies and real data analysis show that our new method can provide more reliable results than the existing meta-analysis methods in the presence of outliers.
format Online
Article
Text
id pubmed-8288516
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-82885162021-07-20 Meta-Analyzing Multiple Omics Data With Robust Variable Selection Hu, Zongliang Zhou, Yan Tong, Tiejun Front Genet Genetics High-throughput omics data are becoming more and more popular in various areas of science. Given that many publicly available datasets address the same questions, researchers have applied meta-analysis to synthesize multiple datasets to achieve more reliable results for model estimation and prediction. Due to the high dimensionality of omics data, it is also desirable to incorporate variable selection into meta-analysis. Existing meta-analyzing variable selection methods are often sensitive to the presence of outliers, and may lead to missed detections of relevant covariates, especially for lasso-type penalties. In this paper, we develop a robust variable selection algorithm for meta-analyzing high-dimensional datasets based on logistic regression. We first search an outlier-free subset from each dataset by borrowing information across the datasets with repeatedly use of the least trimmed squared estimates for the logistic model and together with a hierarchical bi-level variable selection technique. We then refine a reweighting step to further improve the efficiency after obtaining a reliable non-outlier subset. Simulation studies and real data analysis show that our new method can provide more reliable results than the existing meta-analysis methods in the presence of outliers. Frontiers Media S.A. 2021-07-05 /pmc/articles/PMC8288516/ /pubmed/34290735 http://dx.doi.org/10.3389/fgene.2021.656826 Text en Copyright © 2021 Hu, Zhou and Tong. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Hu, Zongliang
Zhou, Yan
Tong, Tiejun
Meta-Analyzing Multiple Omics Data With Robust Variable Selection
title Meta-Analyzing Multiple Omics Data With Robust Variable Selection
title_full Meta-Analyzing Multiple Omics Data With Robust Variable Selection
title_fullStr Meta-Analyzing Multiple Omics Data With Robust Variable Selection
title_full_unstemmed Meta-Analyzing Multiple Omics Data With Robust Variable Selection
title_short Meta-Analyzing Multiple Omics Data With Robust Variable Selection
title_sort meta-analyzing multiple omics data with robust variable selection
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8288516/
https://www.ncbi.nlm.nih.gov/pubmed/34290735
http://dx.doi.org/10.3389/fgene.2021.656826
work_keys_str_mv AT huzongliang metaanalyzingmultipleomicsdatawithrobustvariableselection
AT zhouyan metaanalyzingmultipleomicsdatawithrobustvariableselection
AT tongtiejun metaanalyzingmultipleomicsdatawithrobustvariableselection