Cargando…

Prior information-assisted integrative analysis of multiple datasets

MOTIVATION: Analyzing genetic data to identify markers and construct predictive models is of great interest in biomedical research. However, limited by cost and sample availability, genetic studies often suffer from the “small sample size, high dimensionality” problem. To tackle this problem, an int...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Feifei, Liang, Dongzuo, Li, Yang, Ma, Shuangge
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10400378/
https://www.ncbi.nlm.nih.gov/pubmed/37490475
http://dx.doi.org/10.1093/bioinformatics/btad452
_version_ 1785084431785000960
author Wang, Feifei
Liang, Dongzuo
Li, Yang
Ma, Shuangge
author_facet Wang, Feifei
Liang, Dongzuo
Li, Yang
Ma, Shuangge
author_sort Wang, Feifei
collection PubMed
description MOTIVATION: Analyzing genetic data to identify markers and construct predictive models is of great interest in biomedical research. However, limited by cost and sample availability, genetic studies often suffer from the “small sample size, high dimensionality” problem. To tackle this problem, an integrative analysis that collectively analyzes multiple datasets with compatible designs is often conducted. For regularizing estimation and selecting relevant variables, penalization and other regularization techniques are routinely adopted. “Blindly” searching over a vast number of variables may not be efficient. RESULTS: We propose incorporating prior information to assist integrative analysis of multiple genetic datasets. To obtain accurate prior information, we adopt a convolutional neural network with an active learning strategy to label textual information from previous studies. Then the extracted prior information is incorporated using a group LASSO-based technique. We conducted a series of simulation studies that demonstrated the satisfactory performance of the proposed method. Finally, data on skin cutaneous melanoma are analyzed to establish practical utility. AVAILABILITY AND IMPLEMENTATION: Code is available at https://github.com/ldz7/PAIA. The data that support the findings in this article are openly available in TCGA (The Cancer Genome Atlas) at https://portal.gdc.cancer.gov/.
format Online
Article
Text
id pubmed-10400378
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-104003782023-08-05 Prior information-assisted integrative analysis of multiple datasets Wang, Feifei Liang, Dongzuo Li, Yang Ma, Shuangge Bioinformatics Original Paper MOTIVATION: Analyzing genetic data to identify markers and construct predictive models is of great interest in biomedical research. However, limited by cost and sample availability, genetic studies often suffer from the “small sample size, high dimensionality” problem. To tackle this problem, an integrative analysis that collectively analyzes multiple datasets with compatible designs is often conducted. For regularizing estimation and selecting relevant variables, penalization and other regularization techniques are routinely adopted. “Blindly” searching over a vast number of variables may not be efficient. RESULTS: We propose incorporating prior information to assist integrative analysis of multiple genetic datasets. To obtain accurate prior information, we adopt a convolutional neural network with an active learning strategy to label textual information from previous studies. Then the extracted prior information is incorporated using a group LASSO-based technique. We conducted a series of simulation studies that demonstrated the satisfactory performance of the proposed method. Finally, data on skin cutaneous melanoma are analyzed to establish practical utility. AVAILABILITY AND IMPLEMENTATION: Code is available at https://github.com/ldz7/PAIA. The data that support the findings in this article are openly available in TCGA (The Cancer Genome Atlas) at https://portal.gdc.cancer.gov/. Oxford University Press 2023-07-25 /pmc/articles/PMC10400378/ /pubmed/37490475 http://dx.doi.org/10.1093/bioinformatics/btad452 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Wang, Feifei
Liang, Dongzuo
Li, Yang
Ma, Shuangge
Prior information-assisted integrative analysis of multiple datasets
title Prior information-assisted integrative analysis of multiple datasets
title_full Prior information-assisted integrative analysis of multiple datasets
title_fullStr Prior information-assisted integrative analysis of multiple datasets
title_full_unstemmed Prior information-assisted integrative analysis of multiple datasets
title_short Prior information-assisted integrative analysis of multiple datasets
title_sort prior information-assisted integrative analysis of multiple datasets
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10400378/
https://www.ncbi.nlm.nih.gov/pubmed/37490475
http://dx.doi.org/10.1093/bioinformatics/btad452
work_keys_str_mv AT wangfeifei priorinformationassistedintegrativeanalysisofmultipledatasets
AT liangdongzuo priorinformationassistedintegrativeanalysisofmultipledatasets
AT liyang priorinformationassistedintegrativeanalysisofmultipledatasets
AT mashuangge priorinformationassistedintegrativeanalysisofmultipledatasets