Cargando…

A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression

BACKGROUND: Gene expression analysis can provide useful information for analyzing complex biological mechanisms. However, many reported findings are unrepeatable due to small sample sizes relative to a large number of genes and the low signal-to-noise ratios of most gene expression datasets. RESULTS...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Hai-Hui, Rao, Hao, Miao, Rui, Liang, Yong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9396780/
https://www.ncbi.nlm.nih.gov/pubmed/35999505
http://dx.doi.org/10.1186/s12859-022-04887-5
_version_ 1784771996881518592
author Huang, Hai-Hui
Rao, Hao
Miao, Rui
Liang, Yong
author_facet Huang, Hai-Hui
Rao, Hao
Miao, Rui
Liang, Yong
author_sort Huang, Hai-Hui
collection PubMed
description BACKGROUND: Gene expression analysis can provide useful information for analyzing complex biological mechanisms. However, many reported findings are unrepeatable due to small sample sizes relative to a large number of genes and the low signal-to-noise ratios of most gene expression datasets. RESULTS: Meta-analysis of multi-data sets is an efficient method for tackling the above problem. To improve the performance of meta-analysis, we propose a novel meta-analysis framework. It consists of two parts: (1) a novel data augmentation strategy. Various cross-platform normalization methods exist, which can preserve original biological information of gene expression datasets from different angles and add different “perturbations” to the dataset. Using such perturbation, we provide a feasible means for gene expression data augmentation; (2) elastic data shared lasso (DSL-[Formula: see text] ). The DSL-[Formula: see text] method spans the continuum between individual models for each dataset and one model for all datasets. It also overcomes the shortcomings of the data shared lasso method when dealing with highly correlated features. Comprehensive simulation experiment results show that the proposed method has high prediction and gene selection performance. We then apply the proposed method to non-small cell lung cancer (NSCLC) blood gene expression data in order to identify key tumor-related genes. The outcomes of our experiment indicate that the method could be used for identifying a set of robust disease-related gene signatures that may be used for NSCLC early diagnosis or prognosis or even targeting. CONCLUSION: We propose a novel and effective meta-analysis method for biological research, extrapolating and integrating information from multiple gene expression datasets.
format Online
Article
Text
id pubmed-9396780
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-93967802022-08-24 A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression Huang, Hai-Hui Rao, Hao Miao, Rui Liang, Yong BMC Bioinformatics Research BACKGROUND: Gene expression analysis can provide useful information for analyzing complex biological mechanisms. However, many reported findings are unrepeatable due to small sample sizes relative to a large number of genes and the low signal-to-noise ratios of most gene expression datasets. RESULTS: Meta-analysis of multi-data sets is an efficient method for tackling the above problem. To improve the performance of meta-analysis, we propose a novel meta-analysis framework. It consists of two parts: (1) a novel data augmentation strategy. Various cross-platform normalization methods exist, which can preserve original biological information of gene expression datasets from different angles and add different “perturbations” to the dataset. Using such perturbation, we provide a feasible means for gene expression data augmentation; (2) elastic data shared lasso (DSL-[Formula: see text] ). The DSL-[Formula: see text] method spans the continuum between individual models for each dataset and one model for all datasets. It also overcomes the shortcomings of the data shared lasso method when dealing with highly correlated features. Comprehensive simulation experiment results show that the proposed method has high prediction and gene selection performance. We then apply the proposed method to non-small cell lung cancer (NSCLC) blood gene expression data in order to identify key tumor-related genes. The outcomes of our experiment indicate that the method could be used for identifying a set of robust disease-related gene signatures that may be used for NSCLC early diagnosis or prognosis or even targeting. CONCLUSION: We propose a novel and effective meta-analysis method for biological research, extrapolating and integrating information from multiple gene expression datasets. BioMed Central 2022-08-23 /pmc/articles/PMC9396780/ /pubmed/35999505 http://dx.doi.org/10.1186/s12859-022-04887-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Huang, Hai-Hui
Rao, Hao
Miao, Rui
Liang, Yong
A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression
title A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression
title_full A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression
title_fullStr A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression
title_full_unstemmed A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression
title_short A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression
title_sort novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9396780/
https://www.ncbi.nlm.nih.gov/pubmed/35999505
http://dx.doi.org/10.1186/s12859-022-04887-5
work_keys_str_mv AT huanghaihui anovelmetaanalysisbasedondataaugmentationandelasticdatasharedlassoregularizationforgeneexpression
AT raohao anovelmetaanalysisbasedondataaugmentationandelasticdatasharedlassoregularizationforgeneexpression
AT miaorui anovelmetaanalysisbasedondataaugmentationandelasticdatasharedlassoregularizationforgeneexpression
AT liangyong anovelmetaanalysisbasedondataaugmentationandelasticdatasharedlassoregularizationforgeneexpression
AT huanghaihui novelmetaanalysisbasedondataaugmentationandelasticdatasharedlassoregularizationforgeneexpression
AT raohao novelmetaanalysisbasedondataaugmentationandelasticdatasharedlassoregularizationforgeneexpression
AT miaorui novelmetaanalysisbasedondataaugmentationandelasticdatasharedlassoregularizationforgeneexpression
AT liangyong novelmetaanalysisbasedondataaugmentationandelasticdatasharedlassoregularizationforgeneexpression