Cargando…

Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource

Biomedical literature articles and narrative content from Electronic Health Records (EHRs) both constitute rich sources of disease-phenotype information. Phenotype concepts may be mentioned in text in multiple ways, using phrases with a variety of structures. This variability stems partly from the d...

Descripción completa

Detalles Bibliográficos
Autores principales:	Alnazzawi, Noha, Thompson, Paul, Ananiadou, Sophia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2016
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5028053/ https://www.ncbi.nlm.nih.gov/pubmed/27643689 http://dx.doi.org/10.1371/journal.pone.0162287

_version_	1782454326022635520
author	Alnazzawi, Noha Thompson, Paul Ananiadou, Sophia
author_facet	Alnazzawi, Noha Thompson, Paul Ananiadou, Sophia
author_sort	Alnazzawi, Noha
collection	PubMed
description	Biomedical literature articles and narrative content from Electronic Health Records (EHRs) both constitute rich sources of disease-phenotype information. Phenotype concepts may be mentioned in text in multiple ways, using phrases with a variety of structures. This variability stems partly from the different backgrounds of the authors, but also from the different writing styles typically used in each text type. Since EHR narrative reports and literature articles contain different but complementary types of valuable information, combining details from each text type can help to uncover new disease-phenotype associations. However, the alternative ways in which the same concept may be mentioned in each source constitutes a barrier to the automatic integration of information. Accordingly, identification of the unique concepts represented by phrases in text can help to bridge the gap between text types. We describe our development of a novel method, PhenoNorm, which integrates a number of different similarity measures to allow automatic linking of phenotype concept mentions to known concepts in the UMLS Metathesaurus, a biomedical terminological resource. PhenoNorm was developed using the PhenoCHF corpus—a collection of literature articles and narratives in EHRs, annotated for phenotypic information relating to congestive heart failure (CHF). We evaluate the performance of PhenoNorm in linking CHF-related phenotype mentions to Metathesaurus concepts, using a newly enriched version of PhenoCHF, in which each phenotype mention has an expert-verified link to a concept in the UMLS Metathesaurus. We show that PhenoNorm outperforms a number of alternative methods applied to the same task. Furthermore, we demonstrate PhenoNorm’s wider utility, by evaluating its ability to link mentions of various other types of medically-related information, occurring in texts covering wider subject areas, to concepts in different terminological resources. We show that PhenoNorm can maintain performance levels, and that its accuracy compares favourably to other methods applied to these tasks.
format	Online Article Text
id	pubmed-5028053
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-50280532016-09-27 Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource Alnazzawi, Noha Thompson, Paul Ananiadou, Sophia PLoS One Research Article Biomedical literature articles and narrative content from Electronic Health Records (EHRs) both constitute rich sources of disease-phenotype information. Phenotype concepts may be mentioned in text in multiple ways, using phrases with a variety of structures. This variability stems partly from the different backgrounds of the authors, but also from the different writing styles typically used in each text type. Since EHR narrative reports and literature articles contain different but complementary types of valuable information, combining details from each text type can help to uncover new disease-phenotype associations. However, the alternative ways in which the same concept may be mentioned in each source constitutes a barrier to the automatic integration of information. Accordingly, identification of the unique concepts represented by phrases in text can help to bridge the gap between text types. We describe our development of a novel method, PhenoNorm, which integrates a number of different similarity measures to allow automatic linking of phenotype concept mentions to known concepts in the UMLS Metathesaurus, a biomedical terminological resource. PhenoNorm was developed using the PhenoCHF corpus—a collection of literature articles and narratives in EHRs, annotated for phenotypic information relating to congestive heart failure (CHF). We evaluate the performance of PhenoNorm in linking CHF-related phenotype mentions to Metathesaurus concepts, using a newly enriched version of PhenoCHF, in which each phenotype mention has an expert-verified link to a concept in the UMLS Metathesaurus. We show that PhenoNorm outperforms a number of alternative methods applied to the same task. Furthermore, we demonstrate PhenoNorm’s wider utility, by evaluating its ability to link mentions of various other types of medically-related information, occurring in texts covering wider subject areas, to concepts in different terminological resources. We show that PhenoNorm can maintain performance levels, and that its accuracy compares favourably to other methods applied to these tasks. Public Library of Science 2016-09-19 /pmc/articles/PMC5028053/ /pubmed/27643689 http://dx.doi.org/10.1371/journal.pone.0162287 Text en © 2016 Alnazzawi et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Alnazzawi, Noha Thompson, Paul Ananiadou, Sophia Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource
title	Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource
title_full	Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource
title_fullStr	Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource
title_full_unstemmed	Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource
title_short	Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource
title_sort	mapping phenotypic information in heterogeneous textual sources to a domain-specific terminological resource
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5028053/ https://www.ncbi.nlm.nih.gov/pubmed/27643689 http://dx.doi.org/10.1371/journal.pone.0162287
work_keys_str_mv	AT alnazzawinoha mappingphenotypicinformationinheterogeneoustextualsourcestoadomainspecificterminologicalresource AT thompsonpaul mappingphenotypicinformationinheterogeneoustextualsourcestoadomainspecificterminologicalresource AT ananiadousophia mappingphenotypicinformationinheterogeneoustextualsourcestoadomainspecificterminologicalresource

Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource

Ejemplares similares