Cargando…

Ontology-based annotations and semantic relations in large-scale (epi)genomics data

Public repositories of large-scale biological data currently contain hundreds of thousands of experiments, including high-throughput sequencing and microarray data. The potential of using these resources to assemble data sets combining samples previously not associated is vastly unexplored. This req...

Descripción completa

Detalles Bibliográficos
Autores principales: Galeota, Eugenia, Pelizzola, Mattia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5429001/
https://www.ncbi.nlm.nih.gov/pubmed/27142216
http://dx.doi.org/10.1093/bib/bbw036
_version_ 1783235947169579008
author Galeota, Eugenia
Pelizzola, Mattia
author_facet Galeota, Eugenia
Pelizzola, Mattia
author_sort Galeota, Eugenia
collection PubMed
description Public repositories of large-scale biological data currently contain hundreds of thousands of experiments, including high-throughput sequencing and microarray data. The potential of using these resources to assemble data sets combining samples previously not associated is vastly unexplored. This requires the ability to associate samples with clear annotations and to relate experiments matched with different annotation terms. In this study, we illustrate the semantic annotation of Gene Expression Omnibus samples metadata using concepts from biomedical ontologies, focusing on the association of thousands of chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) samples with a given target, tissue and disease state. Next, we demonstrate the feasibility of quantitatively measuring the semantic similarity between different samples, with the aim of combining experiments associated with the same or similar semantic annotations, thus allowing the generation of large data sets without the need of additional experiments. We compared tools based on Unified Medical Language System with tools that use topic-specific ontologies, showing that the second approach outperforms the first both in the annotation process and in the computation of semantic similarity measures. Finally, we demonstrated the potential of this approach by identifying semantically homogeneous groups of ChIP-seq samples targeting the Myc transcription factor, and expanding this data set with semantically coherent epigenetic samples. The semantic information of these data sets proved to be coherent with the ChIP-seq signal and with the current knowledge about this transcription factor.
format Online
Article
Text
id pubmed-5429001
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-54290012017-05-17 Ontology-based annotations and semantic relations in large-scale (epi)genomics data Galeota, Eugenia Pelizzola, Mattia Brief Bioinform Papers Public repositories of large-scale biological data currently contain hundreds of thousands of experiments, including high-throughput sequencing and microarray data. The potential of using these resources to assemble data sets combining samples previously not associated is vastly unexplored. This requires the ability to associate samples with clear annotations and to relate experiments matched with different annotation terms. In this study, we illustrate the semantic annotation of Gene Expression Omnibus samples metadata using concepts from biomedical ontologies, focusing on the association of thousands of chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) samples with a given target, tissue and disease state. Next, we demonstrate the feasibility of quantitatively measuring the semantic similarity between different samples, with the aim of combining experiments associated with the same or similar semantic annotations, thus allowing the generation of large data sets without the need of additional experiments. We compared tools based on Unified Medical Language System with tools that use topic-specific ontologies, showing that the second approach outperforms the first both in the annotation process and in the computation of semantic similarity measures. Finally, we demonstrated the potential of this approach by identifying semantically homogeneous groups of ChIP-seq samples targeting the Myc transcription factor, and expanding this data set with semantically coherent epigenetic samples. The semantic information of these data sets proved to be coherent with the ChIP-seq signal and with the current knowledge about this transcription factor. Oxford University Press 2017-05 2016-05-03 /pmc/articles/PMC5429001/ /pubmed/27142216 http://dx.doi.org/10.1093/bib/bbw036 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Papers
Galeota, Eugenia
Pelizzola, Mattia
Ontology-based annotations and semantic relations in large-scale (epi)genomics data
title Ontology-based annotations and semantic relations in large-scale (epi)genomics data
title_full Ontology-based annotations and semantic relations in large-scale (epi)genomics data
title_fullStr Ontology-based annotations and semantic relations in large-scale (epi)genomics data
title_full_unstemmed Ontology-based annotations and semantic relations in large-scale (epi)genomics data
title_short Ontology-based annotations and semantic relations in large-scale (epi)genomics data
title_sort ontology-based annotations and semantic relations in large-scale (epi)genomics data
topic Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5429001/
https://www.ncbi.nlm.nih.gov/pubmed/27142216
http://dx.doi.org/10.1093/bib/bbw036
work_keys_str_mv AT galeotaeugenia ontologybasedannotationsandsemanticrelationsinlargescaleepigenomicsdata
AT pelizzolamattia ontologybasedannotationsandsemanticrelationsinlargescaleepigenomicsdata