Cargando…

TiMEG: an integrative statistical method for partially missing multi-omics data

Multi-omics data integration is widely used to understand the genetic architecture of disease. In multi-omics association analysis, data collected on multiple omics for the same set of individuals are immensely important for biomarker identification. But when the sample size of such data is limited,...

Descripción completa

Detalles Bibliográficos
Autores principales: Das, Sarmistha, Mukhopadhyay, Indranil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8674330/
https://www.ncbi.nlm.nih.gov/pubmed/34911979
http://dx.doi.org/10.1038/s41598-021-03034-z
_version_ 1784615628175310848
author Das, Sarmistha
Mukhopadhyay, Indranil
author_facet Das, Sarmistha
Mukhopadhyay, Indranil
author_sort Das, Sarmistha
collection PubMed
description Multi-omics data integration is widely used to understand the genetic architecture of disease. In multi-omics association analysis, data collected on multiple omics for the same set of individuals are immensely important for biomarker identification. But when the sample size of such data is limited, the presence of partially missing individual-level observations poses a major challenge in data integration. More often, genotype data are available for all individuals under study but gene expression and/or methylation information are missing for different subsets of those individuals. Here, we develop a statistical model TiMEG, for the identification of disease-associated biomarkers in a case–control paradigm by integrating the above-mentioned data types, especially, in presence of missing omics data. Based on a likelihood approach, TiMEG exploits the inter-relationship among multiple omics data to capture weaker signals, that remain unidentified in single-omic analysis or common imputation-based methods. Its application on a real tuberous sclerosis dataset identified functionally relevant genes in the disease pathway.
format Online
Article
Text
id pubmed-8674330
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-86743302021-12-20 TiMEG: an integrative statistical method for partially missing multi-omics data Das, Sarmistha Mukhopadhyay, Indranil Sci Rep Article Multi-omics data integration is widely used to understand the genetic architecture of disease. In multi-omics association analysis, data collected on multiple omics for the same set of individuals are immensely important for biomarker identification. But when the sample size of such data is limited, the presence of partially missing individual-level observations poses a major challenge in data integration. More often, genotype data are available for all individuals under study but gene expression and/or methylation information are missing for different subsets of those individuals. Here, we develop a statistical model TiMEG, for the identification of disease-associated biomarkers in a case–control paradigm by integrating the above-mentioned data types, especially, in presence of missing omics data. Based on a likelihood approach, TiMEG exploits the inter-relationship among multiple omics data to capture weaker signals, that remain unidentified in single-omic analysis or common imputation-based methods. Its application on a real tuberous sclerosis dataset identified functionally relevant genes in the disease pathway. Nature Publishing Group UK 2021-12-15 /pmc/articles/PMC8674330/ /pubmed/34911979 http://dx.doi.org/10.1038/s41598-021-03034-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Das, Sarmistha
Mukhopadhyay, Indranil
TiMEG: an integrative statistical method for partially missing multi-omics data
title TiMEG: an integrative statistical method for partially missing multi-omics data
title_full TiMEG: an integrative statistical method for partially missing multi-omics data
title_fullStr TiMEG: an integrative statistical method for partially missing multi-omics data
title_full_unstemmed TiMEG: an integrative statistical method for partially missing multi-omics data
title_short TiMEG: an integrative statistical method for partially missing multi-omics data
title_sort timeg: an integrative statistical method for partially missing multi-omics data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8674330/
https://www.ncbi.nlm.nih.gov/pubmed/34911979
http://dx.doi.org/10.1038/s41598-021-03034-z
work_keys_str_mv AT dassarmistha timeganintegrativestatisticalmethodforpartiallymissingmultiomicsdata
AT mukhopadhyayindranil timeganintegrativestatisticalmethodforpartiallymissingmultiomicsdata