Cargando…

DISNET: a framework for extracting phenotypic disease information from public sources

BACKGROUND: Within the global endeavour of improving population health, one major challenge is the identification and integration of medical knowledge spread through several information sources. The creation of a comprehensive dataset of diseases and their clinical manifestations based on informatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Lagunes-García, Gerardo, Rodríguez-González, Alejandro, Prieto-Santamaría, Lucía, García del Valle, Eduardo P., Zanin, Massimiliano, Menasalvas-Ruiz, Ernestina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7032061/
https://www.ncbi.nlm.nih.gov/pubmed/32110491
http://dx.doi.org/10.7717/peerj.8580
_version_ 1783499495823114240
author Lagunes-García, Gerardo
Rodríguez-González, Alejandro
Prieto-Santamaría, Lucía
García del Valle, Eduardo P.
Zanin, Massimiliano
Menasalvas-Ruiz, Ernestina
author_facet Lagunes-García, Gerardo
Rodríguez-González, Alejandro
Prieto-Santamaría, Lucía
García del Valle, Eduardo P.
Zanin, Massimiliano
Menasalvas-Ruiz, Ernestina
author_sort Lagunes-García, Gerardo
collection PubMed
description BACKGROUND: Within the global endeavour of improving population health, one major challenge is the identification and integration of medical knowledge spread through several information sources. The creation of a comprehensive dataset of diseases and their clinical manifestations based on information from public sources is an interesting approach that allows one not only to complement and merge medical knowledge but also to increase it and thereby to interconnect existing data and analyse and relate diseases to each other. In this paper, we present DISNET (http://disnet.ctb.upm.es/), a web-based system designed to periodically extract the knowledge from signs and symptoms retrieved from medical databases, and to enable the creation of customisable disease networks. METHODS: We here present the main features of the DISNET system. We describe how information on diseases and their phenotypic manifestations is extracted from Wikipedia and PubMed websites; specifically, texts from these sources are processed through a combination of text mining and natural language processing techniques. RESULTS: We further present the validation of our system on Wikipedia and PubMed texts, obtaining the relevant accuracy. The final output includes the creation of a comprehensive symptoms-disease dataset, shared (free access) through the system’s API. We finally describe, with some simple use cases, how a user can interact with it and extract information that could be used for subsequent analyses. DISCUSSION: DISNET allows retrieving knowledge about the signs, symptoms and diagnostic tests associated with a disease. It is not limited to a specific category (all the categories that the selected sources of information offer us) and clinical diagnosis terms. It further allows to track the evolution of those terms through time, being thus an opportunity to analyse and observe the progress of human knowledge on diseases. We further discussed the validation of the system, suggesting that it is good enough to be used to extract diseases and diagnostically-relevant terms. At the same time, the evaluation also revealed that improvements could be introduced to enhance the system’s reliability.
format Online
Article
Text
id pubmed-7032061
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-70320612020-02-27 DISNET: a framework for extracting phenotypic disease information from public sources Lagunes-García, Gerardo Rodríguez-González, Alejandro Prieto-Santamaría, Lucía García del Valle, Eduardo P. Zanin, Massimiliano Menasalvas-Ruiz, Ernestina PeerJ Bioinformatics BACKGROUND: Within the global endeavour of improving population health, one major challenge is the identification and integration of medical knowledge spread through several information sources. The creation of a comprehensive dataset of diseases and their clinical manifestations based on information from public sources is an interesting approach that allows one not only to complement and merge medical knowledge but also to increase it and thereby to interconnect existing data and analyse and relate diseases to each other. In this paper, we present DISNET (http://disnet.ctb.upm.es/), a web-based system designed to periodically extract the knowledge from signs and symptoms retrieved from medical databases, and to enable the creation of customisable disease networks. METHODS: We here present the main features of the DISNET system. We describe how information on diseases and their phenotypic manifestations is extracted from Wikipedia and PubMed websites; specifically, texts from these sources are processed through a combination of text mining and natural language processing techniques. RESULTS: We further present the validation of our system on Wikipedia and PubMed texts, obtaining the relevant accuracy. The final output includes the creation of a comprehensive symptoms-disease dataset, shared (free access) through the system’s API. We finally describe, with some simple use cases, how a user can interact with it and extract information that could be used for subsequent analyses. DISCUSSION: DISNET allows retrieving knowledge about the signs, symptoms and diagnostic tests associated with a disease. It is not limited to a specific category (all the categories that the selected sources of information offer us) and clinical diagnosis terms. It further allows to track the evolution of those terms through time, being thus an opportunity to analyse and observe the progress of human knowledge on diseases. We further discussed the validation of the system, suggesting that it is good enough to be used to extract diseases and diagnostically-relevant terms. At the same time, the evaluation also revealed that improvements could be introduced to enhance the system’s reliability. PeerJ Inc. 2020-02-17 /pmc/articles/PMC7032061/ /pubmed/32110491 http://dx.doi.org/10.7717/peerj.8580 Text en ©2020 Lagunes-García et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Lagunes-García, Gerardo
Rodríguez-González, Alejandro
Prieto-Santamaría, Lucía
García del Valle, Eduardo P.
Zanin, Massimiliano
Menasalvas-Ruiz, Ernestina
DISNET: a framework for extracting phenotypic disease information from public sources
title DISNET: a framework for extracting phenotypic disease information from public sources
title_full DISNET: a framework for extracting phenotypic disease information from public sources
title_fullStr DISNET: a framework for extracting phenotypic disease information from public sources
title_full_unstemmed DISNET: a framework for extracting phenotypic disease information from public sources
title_short DISNET: a framework for extracting phenotypic disease information from public sources
title_sort disnet: a framework for extracting phenotypic disease information from public sources
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7032061/
https://www.ncbi.nlm.nih.gov/pubmed/32110491
http://dx.doi.org/10.7717/peerj.8580
work_keys_str_mv AT lagunesgarciagerardo disnetaframeworkforextractingphenotypicdiseaseinformationfrompublicsources
AT rodriguezgonzalezalejandro disnetaframeworkforextractingphenotypicdiseaseinformationfrompublicsources
AT prietosantamarialucia disnetaframeworkforextractingphenotypicdiseaseinformationfrompublicsources
AT garciadelvalleeduardop disnetaframeworkforextractingphenotypicdiseaseinformationfrompublicsources
AT zaninmassimiliano disnetaframeworkforextractingphenotypicdiseaseinformationfrompublicsources
AT menasalvasruizernestina disnetaframeworkforextractingphenotypicdiseaseinformationfrompublicsources