Cargando…

Building a semantically annotated corpus for chronic disease complications using two document types

Narrative information in electronic health records (EHRs) contains a wealth of information related to patient health conditions. In addition, people use Twitter to express their experiences regarding personal health issues, such as medical complaints, symptoms, treatments, lifestyle, and other facto...

Descripción completa

Detalles Bibliográficos
Autor principal: Alnazzawi, Noha
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7971867/
https://www.ncbi.nlm.nih.gov/pubmed/33735207
http://dx.doi.org/10.1371/journal.pone.0247319
_version_ 1783666658377728000
author Alnazzawi, Noha
author_facet Alnazzawi, Noha
author_sort Alnazzawi, Noha
collection PubMed
description Narrative information in electronic health records (EHRs) contains a wealth of information related to patient health conditions. In addition, people use Twitter to express their experiences regarding personal health issues, such as medical complaints, symptoms, treatments, lifestyle, and other factors. Both genres of text include different types of health-related information concerning disease complications and risk factors. Knowing detailed information about controlling disease risk factors has a great impact on modifying these risks and subsequently preventing disease complications. Text-mining tools provide efficient solutions to extract and integrate vital information related to disease complications hidden in the large volume of the narrative text. However, the development of text-mining tools depends on the availability of an annotated corpus. In response, we have developed the PrevComp corpus, which is annotated with information relevant to the identification of disease complications, underlying risk factors, and prevention measures, in the context of the interaction between hypertension and diabetes. The corpus is unique and novel in terms of the very specific topic in the biomedical domain and as an integration of information from both EHRs and tweets collected from Twitter. The annotation scheme was designed with guidance by a domain expert, and two further domain experts performed the annotation, resulting in a high-quality annotation, with agreement rate F-scores as high as 0.60 and 0.75 for EHRs and tweets, respectively.
format Online
Article
Text
id pubmed-7971867
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-79718672021-03-31 Building a semantically annotated corpus for chronic disease complications using two document types Alnazzawi, Noha PLoS One Research Article Narrative information in electronic health records (EHRs) contains a wealth of information related to patient health conditions. In addition, people use Twitter to express their experiences regarding personal health issues, such as medical complaints, symptoms, treatments, lifestyle, and other factors. Both genres of text include different types of health-related information concerning disease complications and risk factors. Knowing detailed information about controlling disease risk factors has a great impact on modifying these risks and subsequently preventing disease complications. Text-mining tools provide efficient solutions to extract and integrate vital information related to disease complications hidden in the large volume of the narrative text. However, the development of text-mining tools depends on the availability of an annotated corpus. In response, we have developed the PrevComp corpus, which is annotated with information relevant to the identification of disease complications, underlying risk factors, and prevention measures, in the context of the interaction between hypertension and diabetes. The corpus is unique and novel in terms of the very specific topic in the biomedical domain and as an integration of information from both EHRs and tweets collected from Twitter. The annotation scheme was designed with guidance by a domain expert, and two further domain experts performed the annotation, resulting in a high-quality annotation, with agreement rate F-scores as high as 0.60 and 0.75 for EHRs and tweets, respectively. Public Library of Science 2021-03-18 /pmc/articles/PMC7971867/ /pubmed/33735207 http://dx.doi.org/10.1371/journal.pone.0247319 Text en © 2021 Noha Alnazzawi http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Alnazzawi, Noha
Building a semantically annotated corpus for chronic disease complications using two document types
title Building a semantically annotated corpus for chronic disease complications using two document types
title_full Building a semantically annotated corpus for chronic disease complications using two document types
title_fullStr Building a semantically annotated corpus for chronic disease complications using two document types
title_full_unstemmed Building a semantically annotated corpus for chronic disease complications using two document types
title_short Building a semantically annotated corpus for chronic disease complications using two document types
title_sort building a semantically annotated corpus for chronic disease complications using two document types
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7971867/
https://www.ncbi.nlm.nih.gov/pubmed/33735207
http://dx.doi.org/10.1371/journal.pone.0247319
work_keys_str_mv AT alnazzawinoha buildingasemanticallyannotatedcorpusforchronicdiseasecomplicationsusingtwodocumenttypes