Cargando…
MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis
Errors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4133046/ https://www.ncbi.nlm.nih.gov/pubmed/25122495 http://dx.doi.org/10.1371/journal.pcbi.1003790 |
_version_ | 1782330709381218304 |
---|---|
author | Yoo, Seungyeul Huang, Tao Campbell, Joshua D. Lee, Eunjee Tu, Zhidong Geraci, Mark W. Powell, Charles A. Schadt, Eric E. Spira, Avrum Zhu, Jun |
author_facet | Yoo, Seungyeul Huang, Tao Campbell, Joshua D. Lee, Eunjee Tu, Zhidong Geraci, Mark W. Powell, Charles A. Schadt, Eric E. Spira, Avrum Zhu, Jun |
author_sort | Yoo, Seungyeul |
collection | PubMed |
description | Errors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data are inter-connected by cis-regulations. On that basis, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors in multiple types of molecular data, which can be used in further integrative analysis. Our results indicate that inspection of sample annotation and labeling error is an indispensable data quality assurance step. Applied to a large lung genomic study, MODMatcher increased statistically significant genetic associations and genomic correlations by more than two-fold. In a simulation study, MODMatcher provided more robust results by using three types of omics data than two types of omics data. We further demonstrate that MODMatcher can be broadly applied to large genomic data sets containing multiple types of omics data, such as The Cancer Genome Atlas (TCGA) data sets. |
format | Online Article Text |
id | pubmed-4133046 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-41330462014-08-19 MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis Yoo, Seungyeul Huang, Tao Campbell, Joshua D. Lee, Eunjee Tu, Zhidong Geraci, Mark W. Powell, Charles A. Schadt, Eric E. Spira, Avrum Zhu, Jun PLoS Comput Biol Research Article Errors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data are inter-connected by cis-regulations. On that basis, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors in multiple types of molecular data, which can be used in further integrative analysis. Our results indicate that inspection of sample annotation and labeling error is an indispensable data quality assurance step. Applied to a large lung genomic study, MODMatcher increased statistically significant genetic associations and genomic correlations by more than two-fold. In a simulation study, MODMatcher provided more robust results by using three types of omics data than two types of omics data. We further demonstrate that MODMatcher can be broadly applied to large genomic data sets containing multiple types of omics data, such as The Cancer Genome Atlas (TCGA) data sets. Public Library of Science 2014-08-14 /pmc/articles/PMC4133046/ /pubmed/25122495 http://dx.doi.org/10.1371/journal.pcbi.1003790 Text en © 2014 Yoo et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Yoo, Seungyeul Huang, Tao Campbell, Joshua D. Lee, Eunjee Tu, Zhidong Geraci, Mark W. Powell, Charles A. Schadt, Eric E. Spira, Avrum Zhu, Jun MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis |
title | MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis |
title_full | MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis |
title_fullStr | MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis |
title_full_unstemmed | MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis |
title_short | MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis |
title_sort | modmatcher: multi-omics data matcher for integrative genomic analysis |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4133046/ https://www.ncbi.nlm.nih.gov/pubmed/25122495 http://dx.doi.org/10.1371/journal.pcbi.1003790 |
work_keys_str_mv | AT yooseungyeul modmatchermultiomicsdatamatcherforintegrativegenomicanalysis AT huangtao modmatchermultiomicsdatamatcherforintegrativegenomicanalysis AT campbelljoshuad modmatchermultiomicsdatamatcherforintegrativegenomicanalysis AT leeeunjee modmatchermultiomicsdatamatcherforintegrativegenomicanalysis AT tuzhidong modmatchermultiomicsdatamatcherforintegrativegenomicanalysis AT geracimarkw modmatchermultiomicsdatamatcherforintegrativegenomicanalysis AT powellcharlesa modmatchermultiomicsdatamatcherforintegrativegenomicanalysis AT schadterice modmatchermultiomicsdatamatcherforintegrativegenomicanalysis AT spiraavrum modmatchermultiomicsdatamatcherforintegrativegenomicanalysis AT zhujun modmatchermultiomicsdatamatcherforintegrativegenomicanalysis |