Cargando…
DLAD4U: deriving and prioritizing disease lists from PubMed literature
BACKGROUND: Due to recent technology advancements, disease related knowledge is growing rapidly. It becomes nontrivial to go through all published literature to identify associations between human diseases and genetic, environmental, and life style factors, disease symptoms, and treatment strategies...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6309061/ https://www.ncbi.nlm.nih.gov/pubmed/30591010 http://dx.doi.org/10.1186/s12859-018-2463-0 |
_version_ | 1783383331110387712 |
---|---|
author | Shen, Junhui Vasaikar, Suhas Zhang, Bing |
author_facet | Shen, Junhui Vasaikar, Suhas Zhang, Bing |
author_sort | Shen, Junhui |
collection | PubMed |
description | BACKGROUND: Due to recent technology advancements, disease related knowledge is growing rapidly. It becomes nontrivial to go through all published literature to identify associations between human diseases and genetic, environmental, and life style factors, disease symptoms, and treatment strategies. Here we report DLAD4U (Disease List Automatically Derived For You), an efficient, accurate and easy-to-use disease search engine based on PubMed literature. RESULTS: DLAD4U uses the eSearch and eFetch APIs from the National Center for Biotechnology Information (NCBI) to find publications related to a query and to identify diseases from the retrieved publications. The hypergeometric test was used to prioritize identified diseases for displaying to users. DLAD4U accepts any valid queries for PubMed, and the output results include a ranked disease list, information associated with each disease, chronologically-ordered supporting publications, a summary of the run, and links for file export. DLAD4U outperformed other disease search engines in our comparative evaluation using selected genes and drugs as query terms and manually curated data as “gold standard”. For 100 genes that are associated with only one disease in the gold standard, the Mean Average Precision (MAP) measure from DLAD4U was 0.77, which clearly outperformed other tools. For 10 genes that are associated with multiple diseases in the gold standard, the mean precision, recall and F-measure scores from DLAD4U were always higher than those from other tools. The superior performance of DLAD4U was further confirmed using 100 drugs as queries, with an MAP of 0.90. CONCLUSIONS: DLAD4U is a new, intuitive disease search engine that takes advantage of existing resources at NCBI to provide computational efficiency and uses statistical analyses to ensure accuracy. DLAD4U is publicly available at http://dlad4u.zhang-lab.org. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2463-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6309061 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-63090612019-01-03 DLAD4U: deriving and prioritizing disease lists from PubMed literature Shen, Junhui Vasaikar, Suhas Zhang, Bing BMC Bioinformatics Research BACKGROUND: Due to recent technology advancements, disease related knowledge is growing rapidly. It becomes nontrivial to go through all published literature to identify associations between human diseases and genetic, environmental, and life style factors, disease symptoms, and treatment strategies. Here we report DLAD4U (Disease List Automatically Derived For You), an efficient, accurate and easy-to-use disease search engine based on PubMed literature. RESULTS: DLAD4U uses the eSearch and eFetch APIs from the National Center for Biotechnology Information (NCBI) to find publications related to a query and to identify diseases from the retrieved publications. The hypergeometric test was used to prioritize identified diseases for displaying to users. DLAD4U accepts any valid queries for PubMed, and the output results include a ranked disease list, information associated with each disease, chronologically-ordered supporting publications, a summary of the run, and links for file export. DLAD4U outperformed other disease search engines in our comparative evaluation using selected genes and drugs as query terms and manually curated data as “gold standard”. For 100 genes that are associated with only one disease in the gold standard, the Mean Average Precision (MAP) measure from DLAD4U was 0.77, which clearly outperformed other tools. For 10 genes that are associated with multiple diseases in the gold standard, the mean precision, recall and F-measure scores from DLAD4U were always higher than those from other tools. The superior performance of DLAD4U was further confirmed using 100 drugs as queries, with an MAP of 0.90. CONCLUSIONS: DLAD4U is a new, intuitive disease search engine that takes advantage of existing resources at NCBI to provide computational efficiency and uses statistical analyses to ensure accuracy. DLAD4U is publicly available at http://dlad4u.zhang-lab.org. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2463-0) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-28 /pmc/articles/PMC6309061/ /pubmed/30591010 http://dx.doi.org/10.1186/s12859-018-2463-0 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Shen, Junhui Vasaikar, Suhas Zhang, Bing DLAD4U: deriving and prioritizing disease lists from PubMed literature |
title | DLAD4U: deriving and prioritizing disease lists from PubMed literature |
title_full | DLAD4U: deriving and prioritizing disease lists from PubMed literature |
title_fullStr | DLAD4U: deriving and prioritizing disease lists from PubMed literature |
title_full_unstemmed | DLAD4U: deriving and prioritizing disease lists from PubMed literature |
title_short | DLAD4U: deriving and prioritizing disease lists from PubMed literature |
title_sort | dlad4u: deriving and prioritizing disease lists from pubmed literature |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6309061/ https://www.ncbi.nlm.nih.gov/pubmed/30591010 http://dx.doi.org/10.1186/s12859-018-2463-0 |
work_keys_str_mv | AT shenjunhui dlad4uderivingandprioritizingdiseaselistsfrompubmedliterature AT vasaikarsuhas dlad4uderivingandprioritizingdiseaselistsfrompubmedliterature AT zhangbing dlad4uderivingandprioritizingdiseaselistsfrompubmedliterature |