Cargando…
Ontology-driven integrative analysis of omics data through Onassis
Public repositories of large-scale omics datasets represent a valuable resource for researchers. In fact, data re-analysis can either answer novel questions or provide critical data able to complement in-house experiments. However, despite the development of standards for the compilation of metadata...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6971239/ https://www.ncbi.nlm.nih.gov/pubmed/31959844 http://dx.doi.org/10.1038/s41598-020-57716-1 |
_version_ | 1783489682362859520 |
---|---|
author | Galeota, Eugenia Kishore, Kamal Pelizzola, Mattia |
author_facet | Galeota, Eugenia Kishore, Kamal Pelizzola, Mattia |
author_sort | Galeota, Eugenia |
collection | PubMed |
description | Public repositories of large-scale omics datasets represent a valuable resource for researchers. In fact, data re-analysis can either answer novel questions or provide critical data able to complement in-house experiments. However, despite the development of standards for the compilation of metadata, the identification and organization of samples still constitutes a major bottleneck hampering data reuse. We introduce Onassis, an R package within the Bioconductor environment providing key functionalities of Natural Language Processing (NLP) tools. Leveraging biomedical ontologies, Onassis greatly simplifies the association of samples from large-scale repositories to their representation in terms of ontology-based annotations. Moreover, through the use of semantic similarity measures, Onassis hierarchically organizes the datasets of interest, thus supporting the semantically aware analysis of the corresponding omics data. In conclusion, Onassis leverages NLP techniques, biomedical ontologies, and the R statistical framework, to identify, relate, and analyze datasets from public repositories. The tool was tested on various large-scale datasets, including compendia of gene expression, histone marks, and DNA methylation, illustrating how it can facilitate the integrative analysis of various omics data. |
format | Online Article Text |
id | pubmed-6971239 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-69712392020-01-27 Ontology-driven integrative analysis of omics data through Onassis Galeota, Eugenia Kishore, Kamal Pelizzola, Mattia Sci Rep Article Public repositories of large-scale omics datasets represent a valuable resource for researchers. In fact, data re-analysis can either answer novel questions or provide critical data able to complement in-house experiments. However, despite the development of standards for the compilation of metadata, the identification and organization of samples still constitutes a major bottleneck hampering data reuse. We introduce Onassis, an R package within the Bioconductor environment providing key functionalities of Natural Language Processing (NLP) tools. Leveraging biomedical ontologies, Onassis greatly simplifies the association of samples from large-scale repositories to their representation in terms of ontology-based annotations. Moreover, through the use of semantic similarity measures, Onassis hierarchically organizes the datasets of interest, thus supporting the semantically aware analysis of the corresponding omics data. In conclusion, Onassis leverages NLP techniques, biomedical ontologies, and the R statistical framework, to identify, relate, and analyze datasets from public repositories. The tool was tested on various large-scale datasets, including compendia of gene expression, histone marks, and DNA methylation, illustrating how it can facilitate the integrative analysis of various omics data. Nature Publishing Group UK 2020-01-20 /pmc/articles/PMC6971239/ /pubmed/31959844 http://dx.doi.org/10.1038/s41598-020-57716-1 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Galeota, Eugenia Kishore, Kamal Pelizzola, Mattia Ontology-driven integrative analysis of omics data through Onassis |
title | Ontology-driven integrative analysis of omics data through Onassis |
title_full | Ontology-driven integrative analysis of omics data through Onassis |
title_fullStr | Ontology-driven integrative analysis of omics data through Onassis |
title_full_unstemmed | Ontology-driven integrative analysis of omics data through Onassis |
title_short | Ontology-driven integrative analysis of omics data through Onassis |
title_sort | ontology-driven integrative analysis of omics data through onassis |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6971239/ https://www.ncbi.nlm.nih.gov/pubmed/31959844 http://dx.doi.org/10.1038/s41598-020-57716-1 |
work_keys_str_mv | AT galeotaeugenia ontologydrivenintegrativeanalysisofomicsdatathroughonassis AT kishorekamal ontologydrivenintegrativeanalysisofomicsdatathroughonassis AT pelizzolamattia ontologydrivenintegrativeanalysisofomicsdatathroughonassis |