Cargando…

Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base

Integrating reference datasets (e.g. from high-throughput experiments) with unstructured and manually-assembled information (e.g. notes or comments from individual researchers) has the potential to tailor bioinformatic analyses to specific needs and to lead to new insights. However, developing bespo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Konopka, Tomasz, Ng, Sandra, Smedley, Damian
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8382188/ https://www.ncbi.nlm.nih.gov/pubmed/34379637 http://dx.doi.org/10.1371/journal.pcbi.1009283

_version_	1783741503029379072
author	Konopka, Tomasz Ng, Sandra Smedley, Damian
author_facet	Konopka, Tomasz Ng, Sandra Smedley, Damian
author_sort	Konopka, Tomasz
collection	PubMed
description	Integrating reference datasets (e.g. from high-throughput experiments) with unstructured and manually-assembled information (e.g. notes or comments from individual researchers) has the potential to tailor bioinformatic analyses to specific needs and to lead to new insights. However, developing bespoke analysis pipelines from scratch is time-consuming, and general tools for exploring such heterogeneous data are not available. We argue that by treating all data as text, a knowledge-base can accommodate a range of bioinformatic data types and applications. We show that a database coupled to nearest-neighbor algorithms can address common tasks such as gene-set analysis as well as specific tasks such as ontology translation. We further show that a mathematical transformation motivated by diffusion can be effective for exploration across heterogeneous datasets. Diffusion enables the knowledge-base to begin with a sparse query, impute more features, and find matches that would otherwise remain hidden. This can be used, for example, to map multi-modal queries consisting of gene symbols and phenotypes to descriptions of diseases. Diffusion also enables user-driven learning: when the knowledge-base cannot provide satisfactory search results in the first instance, users can improve the results in real-time by adding domain-specific knowledge. User-driven learning has implications for data management, integration, and curation.
format	Online Article Text
id	pubmed-8382188
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-83821882021-08-24 Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base Konopka, Tomasz Ng, Sandra Smedley, Damian PLoS Comput Biol Research Article Integrating reference datasets (e.g. from high-throughput experiments) with unstructured and manually-assembled information (e.g. notes or comments from individual researchers) has the potential to tailor bioinformatic analyses to specific needs and to lead to new insights. However, developing bespoke analysis pipelines from scratch is time-consuming, and general tools for exploring such heterogeneous data are not available. We argue that by treating all data as text, a knowledge-base can accommodate a range of bioinformatic data types and applications. We show that a database coupled to nearest-neighbor algorithms can address common tasks such as gene-set analysis as well as specific tasks such as ontology translation. We further show that a mathematical transformation motivated by diffusion can be effective for exploration across heterogeneous datasets. Diffusion enables the knowledge-base to begin with a sparse query, impute more features, and find matches that would otherwise remain hidden. This can be used, for example, to map multi-modal queries consisting of gene symbols and phenotypes to descriptions of diseases. Diffusion also enables user-driven learning: when the knowledge-base cannot provide satisfactory search results in the first instance, users can improve the results in real-time by adding domain-specific knowledge. User-driven learning has implications for data management, integration, and curation. Public Library of Science 2021-08-11 /pmc/articles/PMC8382188/ /pubmed/34379637 http://dx.doi.org/10.1371/journal.pcbi.1009283 Text en © 2021 Konopka et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Konopka, Tomasz Ng, Sandra Smedley, Damian Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base
title	Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base
title_full	Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base
title_fullStr	Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base
title_full_unstemmed	Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base
title_short	Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base
title_sort	diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8382188/ https://www.ncbi.nlm.nih.gov/pubmed/34379637 http://dx.doi.org/10.1371/journal.pcbi.1009283
work_keys_str_mv	AT konopkatomasz diffusionenablesintegrationofheterogeneousdataanduserdrivenlearninginadesktopknowledgebase AT ngsandra diffusionenablesintegrationofheterogeneousdataanduserdrivenlearninginadesktopknowledgebase AT smedleydamian diffusionenablesintegrationofheterogeneousdataanduserdrivenlearninginadesktopknowledgebase

Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base

Ejemplares similares