Cargando…

PheneBank: a literature-based database of phenotypes

MOTIVATION: Significant effort has been spent by curators to create coding systems for phenotypes such as the Human Phenotype Ontology, as well as disease–phenotype annotations. We aim to support the discovery of literature-based phenotypes and integrate them into the knowledge discovery process. RE...

Descripción completa

Detalles Bibliográficos
Autores principales: Pilehvar, Mohammad Taher, Bernard, Adam, Smedley, Damian, Collier, Nigel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8796364/
https://www.ncbi.nlm.nih.gov/pubmed/34788791
http://dx.doi.org/10.1093/bioinformatics/btab740
_version_ 1784641289021554688
author Pilehvar, Mohammad Taher
Bernard, Adam
Smedley, Damian
Collier, Nigel
author_facet Pilehvar, Mohammad Taher
Bernard, Adam
Smedley, Damian
Collier, Nigel
author_sort Pilehvar, Mohammad Taher
collection PubMed
description MOTIVATION: Significant effort has been spent by curators to create coding systems for phenotypes such as the Human Phenotype Ontology, as well as disease–phenotype annotations. We aim to support the discovery of literature-based phenotypes and integrate them into the knowledge discovery process. RESULTS: PheneBank is a Web-portal for retrieving human phenotype–disease associations that have been text-mined from the whole of Medline. Our approach exploits state-of-the-art machine learning for concept identification by utilizing an expert annotated rare disease corpus from the PMC Text Mining subset. Evaluation of the system for entities is conducted on a gold-standard corpus of rare disease sentences and for associations against the Monarch initiative data. AVAILABILITY AND IMPLEMENTATION: The PheneBank Web-portal freely available at http://www.phenebank.org. Annotated Medline data is available from Zenodo at DOI: 10.5281/zenodo.1408800. Semantic annotation software is freely available for non-commercial use at GitHub: https://github.com/pilehvar/phenebank. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8796364
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-87963642022-01-31 PheneBank: a literature-based database of phenotypes Pilehvar, Mohammad Taher Bernard, Adam Smedley, Damian Collier, Nigel Bioinformatics Applications Notes MOTIVATION: Significant effort has been spent by curators to create coding systems for phenotypes such as the Human Phenotype Ontology, as well as disease–phenotype annotations. We aim to support the discovery of literature-based phenotypes and integrate them into the knowledge discovery process. RESULTS: PheneBank is a Web-portal for retrieving human phenotype–disease associations that have been text-mined from the whole of Medline. Our approach exploits state-of-the-art machine learning for concept identification by utilizing an expert annotated rare disease corpus from the PMC Text Mining subset. Evaluation of the system for entities is conducted on a gold-standard corpus of rare disease sentences and for associations against the Monarch initiative data. AVAILABILITY AND IMPLEMENTATION: The PheneBank Web-portal freely available at http://www.phenebank.org. Annotated Medline data is available from Zenodo at DOI: 10.5281/zenodo.1408800. Semantic annotation software is freely available for non-commercial use at GitHub: https://github.com/pilehvar/phenebank. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-11-12 /pmc/articles/PMC8796364/ /pubmed/34788791 http://dx.doi.org/10.1093/bioinformatics/btab740 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Notes
Pilehvar, Mohammad Taher
Bernard, Adam
Smedley, Damian
Collier, Nigel
PheneBank: a literature-based database of phenotypes
title PheneBank: a literature-based database of phenotypes
title_full PheneBank: a literature-based database of phenotypes
title_fullStr PheneBank: a literature-based database of phenotypes
title_full_unstemmed PheneBank: a literature-based database of phenotypes
title_short PheneBank: a literature-based database of phenotypes
title_sort phenebank: a literature-based database of phenotypes
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8796364/
https://www.ncbi.nlm.nih.gov/pubmed/34788791
http://dx.doi.org/10.1093/bioinformatics/btab740
work_keys_str_mv AT pilehvarmohammadtaher phenebankaliteraturebaseddatabaseofphenotypes
AT bernardadam phenebankaliteraturebaseddatabaseofphenotypes
AT smedleydamian phenebankaliteraturebaseddatabaseofphenotypes
AT colliernigel phenebankaliteraturebaseddatabaseofphenotypes