Cargando…

Ontology-driven integrative analysis of omics data through Onassis

Public repositories of large-scale omics datasets represent a valuable resource for researchers. In fact, data re-analysis can either answer novel questions or provide critical data able to complement in-house experiments. However, despite the development of standards for the compilation of metadata...

Descripción completa

Detalles Bibliográficos
Autores principales: Galeota, Eugenia, Kishore, Kamal, Pelizzola, Mattia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6971239/
https://www.ncbi.nlm.nih.gov/pubmed/31959844
http://dx.doi.org/10.1038/s41598-020-57716-1
_version_ 1783489682362859520
author Galeota, Eugenia
Kishore, Kamal
Pelizzola, Mattia
author_facet Galeota, Eugenia
Kishore, Kamal
Pelizzola, Mattia
author_sort Galeota, Eugenia
collection PubMed
description Public repositories of large-scale omics datasets represent a valuable resource for researchers. In fact, data re-analysis can either answer novel questions or provide critical data able to complement in-house experiments. However, despite the development of standards for the compilation of metadata, the identification and organization of samples still constitutes a major bottleneck hampering data reuse. We introduce Onassis, an R package within the Bioconductor environment providing key functionalities of Natural Language Processing (NLP) tools. Leveraging biomedical ontologies, Onassis greatly simplifies the association of samples from large-scale repositories to their representation in terms of ontology-based annotations. Moreover, through the use of semantic similarity measures, Onassis hierarchically organizes the datasets of interest, thus supporting the semantically aware analysis of the corresponding omics data. In conclusion, Onassis leverages NLP techniques, biomedical ontologies, and the R statistical framework, to identify, relate, and analyze datasets from public repositories. The tool was tested on various large-scale datasets, including compendia of gene expression, histone marks, and DNA methylation, illustrating how it can facilitate the integrative analysis of various omics data.
format Online
Article
Text
id pubmed-6971239
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-69712392020-01-27 Ontology-driven integrative analysis of omics data through Onassis Galeota, Eugenia Kishore, Kamal Pelizzola, Mattia Sci Rep Article Public repositories of large-scale omics datasets represent a valuable resource for researchers. In fact, data re-analysis can either answer novel questions or provide critical data able to complement in-house experiments. However, despite the development of standards for the compilation of metadata, the identification and organization of samples still constitutes a major bottleneck hampering data reuse. We introduce Onassis, an R package within the Bioconductor environment providing key functionalities of Natural Language Processing (NLP) tools. Leveraging biomedical ontologies, Onassis greatly simplifies the association of samples from large-scale repositories to their representation in terms of ontology-based annotations. Moreover, through the use of semantic similarity measures, Onassis hierarchically organizes the datasets of interest, thus supporting the semantically aware analysis of the corresponding omics data. In conclusion, Onassis leverages NLP techniques, biomedical ontologies, and the R statistical framework, to identify, relate, and analyze datasets from public repositories. The tool was tested on various large-scale datasets, including compendia of gene expression, histone marks, and DNA methylation, illustrating how it can facilitate the integrative analysis of various omics data. Nature Publishing Group UK 2020-01-20 /pmc/articles/PMC6971239/ /pubmed/31959844 http://dx.doi.org/10.1038/s41598-020-57716-1 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Galeota, Eugenia
Kishore, Kamal
Pelizzola, Mattia
Ontology-driven integrative analysis of omics data through Onassis
title Ontology-driven integrative analysis of omics data through Onassis
title_full Ontology-driven integrative analysis of omics data through Onassis
title_fullStr Ontology-driven integrative analysis of omics data through Onassis
title_full_unstemmed Ontology-driven integrative analysis of omics data through Onassis
title_short Ontology-driven integrative analysis of omics data through Onassis
title_sort ontology-driven integrative analysis of omics data through onassis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6971239/
https://www.ncbi.nlm.nih.gov/pubmed/31959844
http://dx.doi.org/10.1038/s41598-020-57716-1
work_keys_str_mv AT galeotaeugenia ontologydrivenintegrativeanalysisofomicsdatathroughonassis
AT kishorekamal ontologydrivenintegrativeanalysisofomicsdatathroughonassis
AT pelizzolamattia ontologydrivenintegrativeanalysisofomicsdatathroughonassis