Cargando…

A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification

BACKGROUND: An integrative multi-omics analysis approach that combines multiple types of omics data including genomics, epigenomics, transcriptomics, proteomics, metabolomics, and microbiomics has become increasing popular for understanding the pathophysiology of complex diseases. Although many mult...

Descripción completa

Detalles Bibliográficos
Autores principales: Chung, Ren-Hua, Kang, Chen-Yu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6486474/
https://www.ncbi.nlm.nih.gov/pubmed/31029063
http://dx.doi.org/10.1093/gigascience/giz045
_version_ 1783414345818963968
author Chung, Ren-Hua
Kang, Chen-Yu
author_facet Chung, Ren-Hua
Kang, Chen-Yu
author_sort Chung, Ren-Hua
collection PubMed
description BACKGROUND: An integrative multi-omics analysis approach that combines multiple types of omics data including genomics, epigenomics, transcriptomics, proteomics, metabolomics, and microbiomics has become increasing popular for understanding the pathophysiology of complex diseases. Although many multi-omics analysis methods have been developed for complex disease studies, only a few simulation tools that simulate multiple types of omics data and model their relationships with disease status are available, and these tools have their limitations in simulating the multi-omics data. RESULTS: We developed the multi-omics data simulator OmicsSIMLA, which simulates genomics (i.e., single-nucleotide polymorphisms [SNPs] and copy number variations), epigenomics (i.e., bisulphite sequencing), transcriptomics (i.e., RNA sequencing), and proteomics (i.e., normalized reverse phase protein array) data at the whole-genome level. Furthermore, the relationships between different types of omics data, such as methylation quantitative trait loci (SNPs influencing methylation), expression quantitative trait loci (SNPs influencing gene expression), and expression quantitative trait methylations (methylations influencing gene expression), were modeled. More importantly, the relationships between these multi-omics data and the disease status were modeled as well. We used OmicsSIMLA to simulate a multi-omics dataset for breast cancer under a hypothetical disease model and used the data to compare the performance among existing multi-omics analysis methods in terms of disease classification accuracy and runtime. We also used OmicsSIMLA to simulate a multi-omics dataset with a scale similar to an ovarian cancer multi-omics dataset. The neural network–based multi-omics analysis method ATHENA was applied to both the real and simulated data and the results were compared. Our results demonstrated that complex disease mechanisms can be simulated by OmicsSIMLA, and ATHENA showed the highest prediction accuracy when the effects of multi-omics features (e.g., SNPs, copy number variations, and gene expression levels) on the disease were strong. Furthermore, similar results can be obtained from ATHENA when analyzing the simulated and real ovarian multi-omics data. CONCLUSIONS: OmicsSIMLA will be useful to evaluate the performace of different multi-omics analysis methods. Sample sizes and power can also be calculated by OmicsSIMLA when planning a new multi-omics disease study.
format Online
Article
Text
id pubmed-6486474
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-64864742019-05-01 A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification Chung, Ren-Hua Kang, Chen-Yu Gigascience Research BACKGROUND: An integrative multi-omics analysis approach that combines multiple types of omics data including genomics, epigenomics, transcriptomics, proteomics, metabolomics, and microbiomics has become increasing popular for understanding the pathophysiology of complex diseases. Although many multi-omics analysis methods have been developed for complex disease studies, only a few simulation tools that simulate multiple types of omics data and model their relationships with disease status are available, and these tools have their limitations in simulating the multi-omics data. RESULTS: We developed the multi-omics data simulator OmicsSIMLA, which simulates genomics (i.e., single-nucleotide polymorphisms [SNPs] and copy number variations), epigenomics (i.e., bisulphite sequencing), transcriptomics (i.e., RNA sequencing), and proteomics (i.e., normalized reverse phase protein array) data at the whole-genome level. Furthermore, the relationships between different types of omics data, such as methylation quantitative trait loci (SNPs influencing methylation), expression quantitative trait loci (SNPs influencing gene expression), and expression quantitative trait methylations (methylations influencing gene expression), were modeled. More importantly, the relationships between these multi-omics data and the disease status were modeled as well. We used OmicsSIMLA to simulate a multi-omics dataset for breast cancer under a hypothetical disease model and used the data to compare the performance among existing multi-omics analysis methods in terms of disease classification accuracy and runtime. We also used OmicsSIMLA to simulate a multi-omics dataset with a scale similar to an ovarian cancer multi-omics dataset. The neural network–based multi-omics analysis method ATHENA was applied to both the real and simulated data and the results were compared. Our results demonstrated that complex disease mechanisms can be simulated by OmicsSIMLA, and ATHENA showed the highest prediction accuracy when the effects of multi-omics features (e.g., SNPs, copy number variations, and gene expression levels) on the disease were strong. Furthermore, similar results can be obtained from ATHENA when analyzing the simulated and real ovarian multi-omics data. CONCLUSIONS: OmicsSIMLA will be useful to evaluate the performace of different multi-omics analysis methods. Sample sizes and power can also be calculated by OmicsSIMLA when planning a new multi-omics disease study. Oxford University Press 2019-04-26 /pmc/articles/PMC6486474/ /pubmed/31029063 http://dx.doi.org/10.1093/gigascience/giz045 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Chung, Ren-Hua
Kang, Chen-Yu
A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification
title A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification
title_full A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification
title_fullStr A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification
title_full_unstemmed A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification
title_short A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification
title_sort multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6486474/
https://www.ncbi.nlm.nih.gov/pubmed/31029063
http://dx.doi.org/10.1093/gigascience/giz045
work_keys_str_mv AT chungrenhua amultiomicsdatasimulatorforcomplexdiseasestudiesanditsapplicationtoevaluatemultiomicsdataanalysismethodsfordiseaseclassification
AT kangchenyu amultiomicsdatasimulatorforcomplexdiseasestudiesanditsapplicationtoevaluatemultiomicsdataanalysismethodsfordiseaseclassification
AT chungrenhua multiomicsdatasimulatorforcomplexdiseasestudiesanditsapplicationtoevaluatemultiomicsdataanalysismethodsfordiseaseclassification
AT kangchenyu multiomicsdatasimulatorforcomplexdiseasestudiesanditsapplicationtoevaluatemultiomicsdataanalysismethodsfordiseaseclassification