Cargando…

An ontology-based method for assessing batch effect adjustment approaches in heterogeneous datasets

MOTIVATION: International consortia such as the Genotype-Tissue Expression (GTEx) project, The Cancer Genome Atlas (TCGA) or the International Human Epigenetics Consortium (IHEC) have produced a wealth of genomic datasets with the goal of advancing our understanding of cell differentiation and disea...

Descripción completa

Detalles Bibliográficos
Autores principales: Schmidt, Florian, List, Markus, Cukuroglu, Engin, Köhler, Sebastian, Göke, Jonathan, Schulz, Marcel H
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129283/
https://www.ncbi.nlm.nih.gov/pubmed/30423059
http://dx.doi.org/10.1093/bioinformatics/bty553
_version_ 1783353773606830080
author Schmidt, Florian
List, Markus
Cukuroglu, Engin
Köhler, Sebastian
Göke, Jonathan
Schulz, Marcel H
author_facet Schmidt, Florian
List, Markus
Cukuroglu, Engin
Köhler, Sebastian
Göke, Jonathan
Schulz, Marcel H
author_sort Schmidt, Florian
collection PubMed
description MOTIVATION: International consortia such as the Genotype-Tissue Expression (GTEx) project, The Cancer Genome Atlas (TCGA) or the International Human Epigenetics Consortium (IHEC) have produced a wealth of genomic datasets with the goal of advancing our understanding of cell differentiation and disease mechanisms. However, utilizing all of these data effectively through integrative analysis is hampered by batch effects, large cell type heterogeneity and low replicate numbers. To study if batch effects across datasets can be observed and adjusted for, we analyze RNA-seq data of 215 samples from ENCODE, Roadmap, BLUEPRINT and DEEP as well as 1336 samples from GTEx and TCGA. While batch effects are a considerable issue, it is non-trivial to determine if batch adjustment leads to an improvement in data quality, especially in cases of low replicate numbers. RESULTS: We present a novel method for assessing the performance of batch effect adjustment methods on heterogeneous data. Our method borrows information from the Cell Ontology to establish if batch adjustment leads to a better agreement between observed pairwise similarity and similarity of cell types inferred from the ontology. A comparison of state-of-the art batch effect adjustment methods suggests that batch effects in heterogeneous datasets with low replicate numbers cannot be adequately adjusted. Better methods need to be developed, which can be assessed objectively in the framework presented here. AVAILABILITY AND IMPLEMENTATION: Our method is available online at https://github.com/SchulzLab/OntologyEval. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6129283
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61292832018-09-12 An ontology-based method for assessing batch effect adjustment approaches in heterogeneous datasets Schmidt, Florian List, Markus Cukuroglu, Engin Köhler, Sebastian Göke, Jonathan Schulz, Marcel H Bioinformatics Eccb 2018: European Conference on Computational Biology Proceedings MOTIVATION: International consortia such as the Genotype-Tissue Expression (GTEx) project, The Cancer Genome Atlas (TCGA) or the International Human Epigenetics Consortium (IHEC) have produced a wealth of genomic datasets with the goal of advancing our understanding of cell differentiation and disease mechanisms. However, utilizing all of these data effectively through integrative analysis is hampered by batch effects, large cell type heterogeneity and low replicate numbers. To study if batch effects across datasets can be observed and adjusted for, we analyze RNA-seq data of 215 samples from ENCODE, Roadmap, BLUEPRINT and DEEP as well as 1336 samples from GTEx and TCGA. While batch effects are a considerable issue, it is non-trivial to determine if batch adjustment leads to an improvement in data quality, especially in cases of low replicate numbers. RESULTS: We present a novel method for assessing the performance of batch effect adjustment methods on heterogeneous data. Our method borrows information from the Cell Ontology to establish if batch adjustment leads to a better agreement between observed pairwise similarity and similarity of cell types inferred from the ontology. A comparison of state-of-the art batch effect adjustment methods suggests that batch effects in heterogeneous datasets with low replicate numbers cannot be adequately adjusted. Better methods need to be developed, which can be assessed objectively in the framework presented here. AVAILABILITY AND IMPLEMENTATION: Our method is available online at https://github.com/SchulzLab/OntologyEval. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-09-01 2018-09-08 /pmc/articles/PMC6129283/ /pubmed/30423059 http://dx.doi.org/10.1093/bioinformatics/bty553 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Eccb 2018: European Conference on Computational Biology Proceedings
Schmidt, Florian
List, Markus
Cukuroglu, Engin
Köhler, Sebastian
Göke, Jonathan
Schulz, Marcel H
An ontology-based method for assessing batch effect adjustment approaches in heterogeneous datasets
title An ontology-based method for assessing batch effect adjustment approaches in heterogeneous datasets
title_full An ontology-based method for assessing batch effect adjustment approaches in heterogeneous datasets
title_fullStr An ontology-based method for assessing batch effect adjustment approaches in heterogeneous datasets
title_full_unstemmed An ontology-based method for assessing batch effect adjustment approaches in heterogeneous datasets
title_short An ontology-based method for assessing batch effect adjustment approaches in heterogeneous datasets
title_sort ontology-based method for assessing batch effect adjustment approaches in heterogeneous datasets
topic Eccb 2018: European Conference on Computational Biology Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129283/
https://www.ncbi.nlm.nih.gov/pubmed/30423059
http://dx.doi.org/10.1093/bioinformatics/bty553
work_keys_str_mv AT schmidtflorian anontologybasedmethodforassessingbatcheffectadjustmentapproachesinheterogeneousdatasets
AT listmarkus anontologybasedmethodforassessingbatcheffectadjustmentapproachesinheterogeneousdatasets
AT cukurogluengin anontologybasedmethodforassessingbatcheffectadjustmentapproachesinheterogeneousdatasets
AT kohlersebastian anontologybasedmethodforassessingbatcheffectadjustmentapproachesinheterogeneousdatasets
AT gokejonathan anontologybasedmethodforassessingbatcheffectadjustmentapproachesinheterogeneousdatasets
AT schulzmarcelh anontologybasedmethodforassessingbatcheffectadjustmentapproachesinheterogeneousdatasets
AT schmidtflorian ontologybasedmethodforassessingbatcheffectadjustmentapproachesinheterogeneousdatasets
AT listmarkus ontologybasedmethodforassessingbatcheffectadjustmentapproachesinheterogeneousdatasets
AT cukurogluengin ontologybasedmethodforassessingbatcheffectadjustmentapproachesinheterogeneousdatasets
AT kohlersebastian ontologybasedmethodforassessingbatcheffectadjustmentapproachesinheterogeneousdatasets
AT gokejonathan ontologybasedmethodforassessingbatcheffectadjustmentapproachesinheterogeneousdatasets
AT schulzmarcelh ontologybasedmethodforassessingbatcheffectadjustmentapproachesinheterogeneousdatasets