Cargando…
A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression
BACKGROUND: Gene expression analysis can provide useful information for analyzing complex biological mechanisms. However, many reported findings are unrepeatable due to small sample sizes relative to a large number of genes and the low signal-to-noise ratios of most gene expression datasets. RESULTS...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9396780/ https://www.ncbi.nlm.nih.gov/pubmed/35999505 http://dx.doi.org/10.1186/s12859-022-04887-5 |
_version_ | 1784771996881518592 |
---|---|
author | Huang, Hai-Hui Rao, Hao Miao, Rui Liang, Yong |
author_facet | Huang, Hai-Hui Rao, Hao Miao, Rui Liang, Yong |
author_sort | Huang, Hai-Hui |
collection | PubMed |
description | BACKGROUND: Gene expression analysis can provide useful information for analyzing complex biological mechanisms. However, many reported findings are unrepeatable due to small sample sizes relative to a large number of genes and the low signal-to-noise ratios of most gene expression datasets. RESULTS: Meta-analysis of multi-data sets is an efficient method for tackling the above problem. To improve the performance of meta-analysis, we propose a novel meta-analysis framework. It consists of two parts: (1) a novel data augmentation strategy. Various cross-platform normalization methods exist, which can preserve original biological information of gene expression datasets from different angles and add different “perturbations” to the dataset. Using such perturbation, we provide a feasible means for gene expression data augmentation; (2) elastic data shared lasso (DSL-[Formula: see text] ). The DSL-[Formula: see text] method spans the continuum between individual models for each dataset and one model for all datasets. It also overcomes the shortcomings of the data shared lasso method when dealing with highly correlated features. Comprehensive simulation experiment results show that the proposed method has high prediction and gene selection performance. We then apply the proposed method to non-small cell lung cancer (NSCLC) blood gene expression data in order to identify key tumor-related genes. The outcomes of our experiment indicate that the method could be used for identifying a set of robust disease-related gene signatures that may be used for NSCLC early diagnosis or prognosis or even targeting. CONCLUSION: We propose a novel and effective meta-analysis method for biological research, extrapolating and integrating information from multiple gene expression datasets. |
format | Online Article Text |
id | pubmed-9396780 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-93967802022-08-24 A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression Huang, Hai-Hui Rao, Hao Miao, Rui Liang, Yong BMC Bioinformatics Research BACKGROUND: Gene expression analysis can provide useful information for analyzing complex biological mechanisms. However, many reported findings are unrepeatable due to small sample sizes relative to a large number of genes and the low signal-to-noise ratios of most gene expression datasets. RESULTS: Meta-analysis of multi-data sets is an efficient method for tackling the above problem. To improve the performance of meta-analysis, we propose a novel meta-analysis framework. It consists of two parts: (1) a novel data augmentation strategy. Various cross-platform normalization methods exist, which can preserve original biological information of gene expression datasets from different angles and add different “perturbations” to the dataset. Using such perturbation, we provide a feasible means for gene expression data augmentation; (2) elastic data shared lasso (DSL-[Formula: see text] ). The DSL-[Formula: see text] method spans the continuum between individual models for each dataset and one model for all datasets. It also overcomes the shortcomings of the data shared lasso method when dealing with highly correlated features. Comprehensive simulation experiment results show that the proposed method has high prediction and gene selection performance. We then apply the proposed method to non-small cell lung cancer (NSCLC) blood gene expression data in order to identify key tumor-related genes. The outcomes of our experiment indicate that the method could be used for identifying a set of robust disease-related gene signatures that may be used for NSCLC early diagnosis or prognosis or even targeting. CONCLUSION: We propose a novel and effective meta-analysis method for biological research, extrapolating and integrating information from multiple gene expression datasets. BioMed Central 2022-08-23 /pmc/articles/PMC9396780/ /pubmed/35999505 http://dx.doi.org/10.1186/s12859-022-04887-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Huang, Hai-Hui Rao, Hao Miao, Rui Liang, Yong A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression |
title | A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression |
title_full | A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression |
title_fullStr | A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression |
title_full_unstemmed | A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression |
title_short | A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression |
title_sort | novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9396780/ https://www.ncbi.nlm.nih.gov/pubmed/35999505 http://dx.doi.org/10.1186/s12859-022-04887-5 |
work_keys_str_mv | AT huanghaihui anovelmetaanalysisbasedondataaugmentationandelasticdatasharedlassoregularizationforgeneexpression AT raohao anovelmetaanalysisbasedondataaugmentationandelasticdatasharedlassoregularizationforgeneexpression AT miaorui anovelmetaanalysisbasedondataaugmentationandelasticdatasharedlassoregularizationforgeneexpression AT liangyong anovelmetaanalysisbasedondataaugmentationandelasticdatasharedlassoregularizationforgeneexpression AT huanghaihui novelmetaanalysisbasedondataaugmentationandelasticdatasharedlassoregularizationforgeneexpression AT raohao novelmetaanalysisbasedondataaugmentationandelasticdatasharedlassoregularizationforgeneexpression AT miaorui novelmetaanalysisbasedondataaugmentationandelasticdatasharedlassoregularizationforgeneexpression AT liangyong novelmetaanalysisbasedondataaugmentationandelasticdatasharedlassoregularizationforgeneexpression |