Cargando…

DLAD4U: deriving and prioritizing disease lists from PubMed literature

BACKGROUND: Due to recent technology advancements, disease related knowledge is growing rapidly. It becomes nontrivial to go through all published literature to identify associations between human diseases and genetic, environmental, and life style factors, disease symptoms, and treatment strategies...

Descripción completa

Detalles Bibliográficos
Autores principales: Shen, Junhui, Vasaikar, Suhas, Zhang, Bing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6309061/
https://www.ncbi.nlm.nih.gov/pubmed/30591010
http://dx.doi.org/10.1186/s12859-018-2463-0
_version_ 1783383331110387712
author Shen, Junhui
Vasaikar, Suhas
Zhang, Bing
author_facet Shen, Junhui
Vasaikar, Suhas
Zhang, Bing
author_sort Shen, Junhui
collection PubMed
description BACKGROUND: Due to recent technology advancements, disease related knowledge is growing rapidly. It becomes nontrivial to go through all published literature to identify associations between human diseases and genetic, environmental, and life style factors, disease symptoms, and treatment strategies. Here we report DLAD4U (Disease List Automatically Derived For You), an efficient, accurate and easy-to-use disease search engine based on PubMed literature. RESULTS: DLAD4U uses the eSearch and eFetch APIs from the National Center for Biotechnology Information (NCBI) to find publications related to a query and to identify diseases from the retrieved publications. The hypergeometric test was used to prioritize identified diseases for displaying to users. DLAD4U accepts any valid queries for PubMed, and the output results include a ranked disease list, information associated with each disease, chronologically-ordered supporting publications, a summary of the run, and links for file export. DLAD4U outperformed other disease search engines in our comparative evaluation using selected genes and drugs as query terms and manually curated data as “gold standard”. For 100 genes that are associated with only one disease in the gold standard, the Mean Average Precision (MAP) measure from DLAD4U was 0.77, which clearly outperformed other tools. For 10 genes that are associated with multiple diseases in the gold standard, the mean precision, recall and F-measure scores from DLAD4U were always higher than those from other tools. The superior performance of DLAD4U was further confirmed using 100 drugs as queries, with an MAP of 0.90. CONCLUSIONS: DLAD4U is a new, intuitive disease search engine that takes advantage of existing resources at NCBI to provide computational efficiency and uses statistical analyses to ensure accuracy. DLAD4U is publicly available at http://dlad4u.zhang-lab.org. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2463-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6309061
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63090612019-01-03 DLAD4U: deriving and prioritizing disease lists from PubMed literature Shen, Junhui Vasaikar, Suhas Zhang, Bing BMC Bioinformatics Research BACKGROUND: Due to recent technology advancements, disease related knowledge is growing rapidly. It becomes nontrivial to go through all published literature to identify associations between human diseases and genetic, environmental, and life style factors, disease symptoms, and treatment strategies. Here we report DLAD4U (Disease List Automatically Derived For You), an efficient, accurate and easy-to-use disease search engine based on PubMed literature. RESULTS: DLAD4U uses the eSearch and eFetch APIs from the National Center for Biotechnology Information (NCBI) to find publications related to a query and to identify diseases from the retrieved publications. The hypergeometric test was used to prioritize identified diseases for displaying to users. DLAD4U accepts any valid queries for PubMed, and the output results include a ranked disease list, information associated with each disease, chronologically-ordered supporting publications, a summary of the run, and links for file export. DLAD4U outperformed other disease search engines in our comparative evaluation using selected genes and drugs as query terms and manually curated data as “gold standard”. For 100 genes that are associated with only one disease in the gold standard, the Mean Average Precision (MAP) measure from DLAD4U was 0.77, which clearly outperformed other tools. For 10 genes that are associated with multiple diseases in the gold standard, the mean precision, recall and F-measure scores from DLAD4U were always higher than those from other tools. The superior performance of DLAD4U was further confirmed using 100 drugs as queries, with an MAP of 0.90. CONCLUSIONS: DLAD4U is a new, intuitive disease search engine that takes advantage of existing resources at NCBI to provide computational efficiency and uses statistical analyses to ensure accuracy. DLAD4U is publicly available at http://dlad4u.zhang-lab.org. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2463-0) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-28 /pmc/articles/PMC6309061/ /pubmed/30591010 http://dx.doi.org/10.1186/s12859-018-2463-0 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Shen, Junhui
Vasaikar, Suhas
Zhang, Bing
DLAD4U: deriving and prioritizing disease lists from PubMed literature
title DLAD4U: deriving and prioritizing disease lists from PubMed literature
title_full DLAD4U: deriving and prioritizing disease lists from PubMed literature
title_fullStr DLAD4U: deriving and prioritizing disease lists from PubMed literature
title_full_unstemmed DLAD4U: deriving and prioritizing disease lists from PubMed literature
title_short DLAD4U: deriving and prioritizing disease lists from PubMed literature
title_sort dlad4u: deriving and prioritizing disease lists from pubmed literature
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6309061/
https://www.ncbi.nlm.nih.gov/pubmed/30591010
http://dx.doi.org/10.1186/s12859-018-2463-0
work_keys_str_mv AT shenjunhui dlad4uderivingandprioritizingdiseaselistsfrompubmedliterature
AT vasaikarsuhas dlad4uderivingandprioritizingdiseaselistsfrompubmedliterature
AT zhangbing dlad4uderivingandprioritizingdiseaselistsfrompubmedliterature