Cargando…

MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis

Errors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data...

Descripción completa

Detalles Bibliográficos
Autores principales: Yoo, Seungyeul, Huang, Tao, Campbell, Joshua D., Lee, Eunjee, Tu, Zhidong, Geraci, Mark W., Powell, Charles A., Schadt, Eric E., Spira, Avrum, Zhu, Jun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4133046/
https://www.ncbi.nlm.nih.gov/pubmed/25122495
http://dx.doi.org/10.1371/journal.pcbi.1003790
_version_ 1782330709381218304
author Yoo, Seungyeul
Huang, Tao
Campbell, Joshua D.
Lee, Eunjee
Tu, Zhidong
Geraci, Mark W.
Powell, Charles A.
Schadt, Eric E.
Spira, Avrum
Zhu, Jun
author_facet Yoo, Seungyeul
Huang, Tao
Campbell, Joshua D.
Lee, Eunjee
Tu, Zhidong
Geraci, Mark W.
Powell, Charles A.
Schadt, Eric E.
Spira, Avrum
Zhu, Jun
author_sort Yoo, Seungyeul
collection PubMed
description Errors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data are inter-connected by cis-regulations. On that basis, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors in multiple types of molecular data, which can be used in further integrative analysis. Our results indicate that inspection of sample annotation and labeling error is an indispensable data quality assurance step. Applied to a large lung genomic study, MODMatcher increased statistically significant genetic associations and genomic correlations by more than two-fold. In a simulation study, MODMatcher provided more robust results by using three types of omics data than two types of omics data. We further demonstrate that MODMatcher can be broadly applied to large genomic data sets containing multiple types of omics data, such as The Cancer Genome Atlas (TCGA) data sets.
format Online
Article
Text
id pubmed-4133046
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-41330462014-08-19 MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis Yoo, Seungyeul Huang, Tao Campbell, Joshua D. Lee, Eunjee Tu, Zhidong Geraci, Mark W. Powell, Charles A. Schadt, Eric E. Spira, Avrum Zhu, Jun PLoS Comput Biol Research Article Errors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data are inter-connected by cis-regulations. On that basis, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors in multiple types of molecular data, which can be used in further integrative analysis. Our results indicate that inspection of sample annotation and labeling error is an indispensable data quality assurance step. Applied to a large lung genomic study, MODMatcher increased statistically significant genetic associations and genomic correlations by more than two-fold. In a simulation study, MODMatcher provided more robust results by using three types of omics data than two types of omics data. We further demonstrate that MODMatcher can be broadly applied to large genomic data sets containing multiple types of omics data, such as The Cancer Genome Atlas (TCGA) data sets. Public Library of Science 2014-08-14 /pmc/articles/PMC4133046/ /pubmed/25122495 http://dx.doi.org/10.1371/journal.pcbi.1003790 Text en © 2014 Yoo et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Yoo, Seungyeul
Huang, Tao
Campbell, Joshua D.
Lee, Eunjee
Tu, Zhidong
Geraci, Mark W.
Powell, Charles A.
Schadt, Eric E.
Spira, Avrum
Zhu, Jun
MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis
title MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis
title_full MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis
title_fullStr MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis
title_full_unstemmed MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis
title_short MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis
title_sort modmatcher: multi-omics data matcher for integrative genomic analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4133046/
https://www.ncbi.nlm.nih.gov/pubmed/25122495
http://dx.doi.org/10.1371/journal.pcbi.1003790
work_keys_str_mv AT yooseungyeul modmatchermultiomicsdatamatcherforintegrativegenomicanalysis
AT huangtao modmatchermultiomicsdatamatcherforintegrativegenomicanalysis
AT campbelljoshuad modmatchermultiomicsdatamatcherforintegrativegenomicanalysis
AT leeeunjee modmatchermultiomicsdatamatcherforintegrativegenomicanalysis
AT tuzhidong modmatchermultiomicsdatamatcherforintegrativegenomicanalysis
AT geracimarkw modmatchermultiomicsdatamatcherforintegrativegenomicanalysis
AT powellcharlesa modmatchermultiomicsdatamatcherforintegrativegenomicanalysis
AT schadterice modmatchermultiomicsdatamatcherforintegrativegenomicanalysis
AT spiraavrum modmatchermultiomicsdatamatcherforintegrativegenomicanalysis
AT zhujun modmatchermultiomicsdatamatcherforintegrativegenomicanalysis