Cargando…
Ontology-based annotations and semantic relations in large-scale (epi)genomics data
Public repositories of large-scale biological data currently contain hundreds of thousands of experiments, including high-throughput sequencing and microarray data. The potential of using these resources to assemble data sets combining samples previously not associated is vastly unexplored. This req...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5429001/ https://www.ncbi.nlm.nih.gov/pubmed/27142216 http://dx.doi.org/10.1093/bib/bbw036 |
_version_ | 1783235947169579008 |
---|---|
author | Galeota, Eugenia Pelizzola, Mattia |
author_facet | Galeota, Eugenia Pelizzola, Mattia |
author_sort | Galeota, Eugenia |
collection | PubMed |
description | Public repositories of large-scale biological data currently contain hundreds of thousands of experiments, including high-throughput sequencing and microarray data. The potential of using these resources to assemble data sets combining samples previously not associated is vastly unexplored. This requires the ability to associate samples with clear annotations and to relate experiments matched with different annotation terms. In this study, we illustrate the semantic annotation of Gene Expression Omnibus samples metadata using concepts from biomedical ontologies, focusing on the association of thousands of chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) samples with a given target, tissue and disease state. Next, we demonstrate the feasibility of quantitatively measuring the semantic similarity between different samples, with the aim of combining experiments associated with the same or similar semantic annotations, thus allowing the generation of large data sets without the need of additional experiments. We compared tools based on Unified Medical Language System with tools that use topic-specific ontologies, showing that the second approach outperforms the first both in the annotation process and in the computation of semantic similarity measures. Finally, we demonstrated the potential of this approach by identifying semantically homogeneous groups of ChIP-seq samples targeting the Myc transcription factor, and expanding this data set with semantically coherent epigenetic samples. The semantic information of these data sets proved to be coherent with the ChIP-seq signal and with the current knowledge about this transcription factor. |
format | Online Article Text |
id | pubmed-5429001 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-54290012017-05-17 Ontology-based annotations and semantic relations in large-scale (epi)genomics data Galeota, Eugenia Pelizzola, Mattia Brief Bioinform Papers Public repositories of large-scale biological data currently contain hundreds of thousands of experiments, including high-throughput sequencing and microarray data. The potential of using these resources to assemble data sets combining samples previously not associated is vastly unexplored. This requires the ability to associate samples with clear annotations and to relate experiments matched with different annotation terms. In this study, we illustrate the semantic annotation of Gene Expression Omnibus samples metadata using concepts from biomedical ontologies, focusing on the association of thousands of chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) samples with a given target, tissue and disease state. Next, we demonstrate the feasibility of quantitatively measuring the semantic similarity between different samples, with the aim of combining experiments associated with the same or similar semantic annotations, thus allowing the generation of large data sets without the need of additional experiments. We compared tools based on Unified Medical Language System with tools that use topic-specific ontologies, showing that the second approach outperforms the first both in the annotation process and in the computation of semantic similarity measures. Finally, we demonstrated the potential of this approach by identifying semantically homogeneous groups of ChIP-seq samples targeting the Myc transcription factor, and expanding this data set with semantically coherent epigenetic samples. The semantic information of these data sets proved to be coherent with the ChIP-seq signal and with the current knowledge about this transcription factor. Oxford University Press 2017-05 2016-05-03 /pmc/articles/PMC5429001/ /pubmed/27142216 http://dx.doi.org/10.1093/bib/bbw036 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Papers Galeota, Eugenia Pelizzola, Mattia Ontology-based annotations and semantic relations in large-scale (epi)genomics data |
title | Ontology-based annotations and semantic relations in large-scale (epi)genomics data |
title_full | Ontology-based annotations and semantic relations in large-scale (epi)genomics data |
title_fullStr | Ontology-based annotations and semantic relations in large-scale (epi)genomics data |
title_full_unstemmed | Ontology-based annotations and semantic relations in large-scale (epi)genomics data |
title_short | Ontology-based annotations and semantic relations in large-scale (epi)genomics data |
title_sort | ontology-based annotations and semantic relations in large-scale (epi)genomics data |
topic | Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5429001/ https://www.ncbi.nlm.nih.gov/pubmed/27142216 http://dx.doi.org/10.1093/bib/bbw036 |
work_keys_str_mv | AT galeotaeugenia ontologybasedannotationsandsemanticrelationsinlargescaleepigenomicsdata AT pelizzolamattia ontologybasedannotationsandsemanticrelationsinlargescaleepigenomicsdata |