Cargando…
Prior information-assisted integrative analysis of multiple datasets
MOTIVATION: Analyzing genetic data to identify markers and construct predictive models is of great interest in biomedical research. However, limited by cost and sample availability, genetic studies often suffer from the “small sample size, high dimensionality” problem. To tackle this problem, an int...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10400378/ https://www.ncbi.nlm.nih.gov/pubmed/37490475 http://dx.doi.org/10.1093/bioinformatics/btad452 |
_version_ | 1785084431785000960 |
---|---|
author | Wang, Feifei Liang, Dongzuo Li, Yang Ma, Shuangge |
author_facet | Wang, Feifei Liang, Dongzuo Li, Yang Ma, Shuangge |
author_sort | Wang, Feifei |
collection | PubMed |
description | MOTIVATION: Analyzing genetic data to identify markers and construct predictive models is of great interest in biomedical research. However, limited by cost and sample availability, genetic studies often suffer from the “small sample size, high dimensionality” problem. To tackle this problem, an integrative analysis that collectively analyzes multiple datasets with compatible designs is often conducted. For regularizing estimation and selecting relevant variables, penalization and other regularization techniques are routinely adopted. “Blindly” searching over a vast number of variables may not be efficient. RESULTS: We propose incorporating prior information to assist integrative analysis of multiple genetic datasets. To obtain accurate prior information, we adopt a convolutional neural network with an active learning strategy to label textual information from previous studies. Then the extracted prior information is incorporated using a group LASSO-based technique. We conducted a series of simulation studies that demonstrated the satisfactory performance of the proposed method. Finally, data on skin cutaneous melanoma are analyzed to establish practical utility. AVAILABILITY AND IMPLEMENTATION: Code is available at https://github.com/ldz7/PAIA. The data that support the findings in this article are openly available in TCGA (The Cancer Genome Atlas) at https://portal.gdc.cancer.gov/. |
format | Online Article Text |
id | pubmed-10400378 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-104003782023-08-05 Prior information-assisted integrative analysis of multiple datasets Wang, Feifei Liang, Dongzuo Li, Yang Ma, Shuangge Bioinformatics Original Paper MOTIVATION: Analyzing genetic data to identify markers and construct predictive models is of great interest in biomedical research. However, limited by cost and sample availability, genetic studies often suffer from the “small sample size, high dimensionality” problem. To tackle this problem, an integrative analysis that collectively analyzes multiple datasets with compatible designs is often conducted. For regularizing estimation and selecting relevant variables, penalization and other regularization techniques are routinely adopted. “Blindly” searching over a vast number of variables may not be efficient. RESULTS: We propose incorporating prior information to assist integrative analysis of multiple genetic datasets. To obtain accurate prior information, we adopt a convolutional neural network with an active learning strategy to label textual information from previous studies. Then the extracted prior information is incorporated using a group LASSO-based technique. We conducted a series of simulation studies that demonstrated the satisfactory performance of the proposed method. Finally, data on skin cutaneous melanoma are analyzed to establish practical utility. AVAILABILITY AND IMPLEMENTATION: Code is available at https://github.com/ldz7/PAIA. The data that support the findings in this article are openly available in TCGA (The Cancer Genome Atlas) at https://portal.gdc.cancer.gov/. Oxford University Press 2023-07-25 /pmc/articles/PMC10400378/ /pubmed/37490475 http://dx.doi.org/10.1093/bioinformatics/btad452 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Paper Wang, Feifei Liang, Dongzuo Li, Yang Ma, Shuangge Prior information-assisted integrative analysis of multiple datasets |
title | Prior information-assisted integrative analysis of multiple datasets |
title_full | Prior information-assisted integrative analysis of multiple datasets |
title_fullStr | Prior information-assisted integrative analysis of multiple datasets |
title_full_unstemmed | Prior information-assisted integrative analysis of multiple datasets |
title_short | Prior information-assisted integrative analysis of multiple datasets |
title_sort | prior information-assisted integrative analysis of multiple datasets |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10400378/ https://www.ncbi.nlm.nih.gov/pubmed/37490475 http://dx.doi.org/10.1093/bioinformatics/btad452 |
work_keys_str_mv | AT wangfeifei priorinformationassistedintegrativeanalysisofmultipledatasets AT liangdongzuo priorinformationassistedintegrativeanalysisofmultipledatasets AT liyang priorinformationassistedintegrativeanalysisofmultipledatasets AT mashuangge priorinformationassistedintegrativeanalysisofmultipledatasets |