Cargando…

Diseases 2.0: a weekly updated database of disease–gene associations from text mining and data integration

The scientific knowledge about which genes are involved in which diseases grows rapidly, which makes it difficult to keep up with new publications and genetics datasets. The DISEASES database aims to provide a comprehensive overview by systematically integrating and assigning confidence scores to ev...

Descripción completa

Detalles Bibliográficos
Autores principales: Grissa, Dhouha, Junge, Alexander, Oprea, Tudor I, Jensen, Lars Juhl
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9216524/
https://www.ncbi.nlm.nih.gov/pubmed/35348648
http://dx.doi.org/10.1093/database/baac019
_version_ 1784731442123636736
author Grissa, Dhouha
Junge, Alexander
Oprea, Tudor I
Jensen, Lars Juhl
author_facet Grissa, Dhouha
Junge, Alexander
Oprea, Tudor I
Jensen, Lars Juhl
author_sort Grissa, Dhouha
collection PubMed
description The scientific knowledge about which genes are involved in which diseases grows rapidly, which makes it difficult to keep up with new publications and genetics datasets. The DISEASES database aims to provide a comprehensive overview by systematically integrating and assigning confidence scores to evidence for disease–gene associations from curated databases, genome-wide association studies (GWAS) and automatic text mining of the biomedical literature. Here, we present a major update to this resource, which greatly increases the number of associations from all these sources. This is especially true for the text-mined associations, which have increased by at least 9-fold at all confidence cutoffs. We show that this dramatic increase is primarily due to adding full-text articles to the text corpus, secondarily due to improvements to both the disease and gene dictionaries used for named entity recognition, and only to a very small extent due to the growth in number of PubMed abstracts. DISEASES now also makes use of a new GWAS database, Target Illumination by GWAS Analytics, which considerably increased the number of GWAS-derived disease–gene associations. DISEASES itself is also integrated into several other databases and resources, including GeneCards/MalaCards, Pharos/Target Central Resource Database and the Cytoscape stringApp. All data in DISEASES are updated on a weekly basis and is available via a web interface at https://diseases.jensenlab.org, from where it can also be downloaded under open licenses. Database URL: https://diseases.jensenlab.org
format Online
Article
Text
id pubmed-9216524
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92165242022-06-23 Diseases 2.0: a weekly updated database of disease–gene associations from text mining and data integration Grissa, Dhouha Junge, Alexander Oprea, Tudor I Jensen, Lars Juhl Database (Oxford) Database Update The scientific knowledge about which genes are involved in which diseases grows rapidly, which makes it difficult to keep up with new publications and genetics datasets. The DISEASES database aims to provide a comprehensive overview by systematically integrating and assigning confidence scores to evidence for disease–gene associations from curated databases, genome-wide association studies (GWAS) and automatic text mining of the biomedical literature. Here, we present a major update to this resource, which greatly increases the number of associations from all these sources. This is especially true for the text-mined associations, which have increased by at least 9-fold at all confidence cutoffs. We show that this dramatic increase is primarily due to adding full-text articles to the text corpus, secondarily due to improvements to both the disease and gene dictionaries used for named entity recognition, and only to a very small extent due to the growth in number of PubMed abstracts. DISEASES now also makes use of a new GWAS database, Target Illumination by GWAS Analytics, which considerably increased the number of GWAS-derived disease–gene associations. DISEASES itself is also integrated into several other databases and resources, including GeneCards/MalaCards, Pharos/Target Central Resource Database and the Cytoscape stringApp. All data in DISEASES are updated on a weekly basis and is available via a web interface at https://diseases.jensenlab.org, from where it can also be downloaded under open licenses. Database URL: https://diseases.jensenlab.org Oxford University Press 2022-03-24 /pmc/articles/PMC9216524/ /pubmed/35348648 http://dx.doi.org/10.1093/database/baac019 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database Update
Grissa, Dhouha
Junge, Alexander
Oprea, Tudor I
Jensen, Lars Juhl
Diseases 2.0: a weekly updated database of disease–gene associations from text mining and data integration
title Diseases 2.0: a weekly updated database of disease–gene associations from text mining and data integration
title_full Diseases 2.0: a weekly updated database of disease–gene associations from text mining and data integration
title_fullStr Diseases 2.0: a weekly updated database of disease–gene associations from text mining and data integration
title_full_unstemmed Diseases 2.0: a weekly updated database of disease–gene associations from text mining and data integration
title_short Diseases 2.0: a weekly updated database of disease–gene associations from text mining and data integration
title_sort diseases 2.0: a weekly updated database of disease–gene associations from text mining and data integration
topic Database Update
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9216524/
https://www.ncbi.nlm.nih.gov/pubmed/35348648
http://dx.doi.org/10.1093/database/baac019
work_keys_str_mv AT grissadhouha diseases20aweeklyupdateddatabaseofdiseasegeneassociationsfromtextmininganddataintegration
AT jungealexander diseases20aweeklyupdateddatabaseofdiseasegeneassociationsfromtextmininganddataintegration
AT opreatudori diseases20aweeklyupdateddatabaseofdiseasegeneassociationsfromtextmininganddataintegration
AT jensenlarsjuhl diseases20aweeklyupdateddatabaseofdiseasegeneassociationsfromtextmininganddataintegration