Cargando…

Annotating and detecting phenotypic information for chronic obstructive pulmonary disease

OBJECTIVES: Chronic obstructive pulmonary disease (COPD) phenotypes cover a range of lung abnormalities. To allow text mining methods to identify pertinent and potentially complex information about these phenotypes from textual data, we have developed a novel annotated corpus, which we use to train...

Descripción completa

Detalles Bibliográficos
Autores principales: Ju, Meizhi, Short, Andrea D, Thompson, Paul, Bakerly, Nawar Diar, Gkoutos, Georgios V, Tsaprouni, Loukia, Ananiadou, Sophia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6951876/
https://www.ncbi.nlm.nih.gov/pubmed/31984360
http://dx.doi.org/10.1093/jamiaopen/ooz009
_version_ 1783486352943218688
author Ju, Meizhi
Short, Andrea D
Thompson, Paul
Bakerly, Nawar Diar
Gkoutos, Georgios V
Tsaprouni, Loukia
Ananiadou, Sophia
author_facet Ju, Meizhi
Short, Andrea D
Thompson, Paul
Bakerly, Nawar Diar
Gkoutos, Georgios V
Tsaprouni, Loukia
Ananiadou, Sophia
author_sort Ju, Meizhi
collection PubMed
description OBJECTIVES: Chronic obstructive pulmonary disease (COPD) phenotypes cover a range of lung abnormalities. To allow text mining methods to identify pertinent and potentially complex information about these phenotypes from textual data, we have developed a novel annotated corpus, which we use to train a neural network-based named entity recognizer to detect fine-grained COPD phenotypic information. MATERIALS AND METHODS: Since COPD phenotype descriptions often mention other concepts within them (proteins, treatments, etc.), our corpus annotations include both outermost phenotype descriptions and concepts nested within them. Our neural layered bidirectional long short-term memory conditional random field (BiLSTM-CRF) network firstly recognizes nested mentions, which are fed into subsequent BiLSTM-CRF layers, to help to recognize enclosing phenotype mentions. RESULTS: Our corpus of 30 full papers (available at: http://www.nactem.ac.uk/COPD) is annotated by experts with 27 030 phenotype-related concept mentions, most of which are automatically linked to UMLS Metathesaurus concepts. When trained using the corpus, our BiLSTM-CRF network outperforms other popular approaches in recognizing detailed phenotypic information. DISCUSSION: Information extracted by our method can facilitate efficient location and exploration of detailed information about phenotypes, for example, those specifically concerning reactions to treatments. CONCLUSION: The importance of our corpus for developing methods to extract fine-grained information about COPD phenotypes is demonstrated through its successful use to train a layered BiLSTM-CRF network to extract phenotypic information at various levels of granularity. The minimal human intervention needed for training should permit ready adaption to extracting phenotypic information about other diseases.
format Online
Article
Text
id pubmed-6951876
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-69518762020-01-24 Annotating and detecting phenotypic information for chronic obstructive pulmonary disease Ju, Meizhi Short, Andrea D Thompson, Paul Bakerly, Nawar Diar Gkoutos, Georgios V Tsaprouni, Loukia Ananiadou, Sophia JAMIA Open Research and Applications OBJECTIVES: Chronic obstructive pulmonary disease (COPD) phenotypes cover a range of lung abnormalities. To allow text mining methods to identify pertinent and potentially complex information about these phenotypes from textual data, we have developed a novel annotated corpus, which we use to train a neural network-based named entity recognizer to detect fine-grained COPD phenotypic information. MATERIALS AND METHODS: Since COPD phenotype descriptions often mention other concepts within them (proteins, treatments, etc.), our corpus annotations include both outermost phenotype descriptions and concepts nested within them. Our neural layered bidirectional long short-term memory conditional random field (BiLSTM-CRF) network firstly recognizes nested mentions, which are fed into subsequent BiLSTM-CRF layers, to help to recognize enclosing phenotype mentions. RESULTS: Our corpus of 30 full papers (available at: http://www.nactem.ac.uk/COPD) is annotated by experts with 27 030 phenotype-related concept mentions, most of which are automatically linked to UMLS Metathesaurus concepts. When trained using the corpus, our BiLSTM-CRF network outperforms other popular approaches in recognizing detailed phenotypic information. DISCUSSION: Information extracted by our method can facilitate efficient location and exploration of detailed information about phenotypes, for example, those specifically concerning reactions to treatments. CONCLUSION: The importance of our corpus for developing methods to extract fine-grained information about COPD phenotypes is demonstrated through its successful use to train a layered BiLSTM-CRF network to extract phenotypic information at various levels of granularity. The minimal human intervention needed for training should permit ready adaption to extracting phenotypic information about other diseases. Oxford University Press 2019-04-26 /pmc/articles/PMC6951876/ /pubmed/31984360 http://dx.doi.org/10.1093/jamiaopen/ooz009 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research and Applications
Ju, Meizhi
Short, Andrea D
Thompson, Paul
Bakerly, Nawar Diar
Gkoutos, Georgios V
Tsaprouni, Loukia
Ananiadou, Sophia
Annotating and detecting phenotypic information for chronic obstructive pulmonary disease
title Annotating and detecting phenotypic information for chronic obstructive pulmonary disease
title_full Annotating and detecting phenotypic information for chronic obstructive pulmonary disease
title_fullStr Annotating and detecting phenotypic information for chronic obstructive pulmonary disease
title_full_unstemmed Annotating and detecting phenotypic information for chronic obstructive pulmonary disease
title_short Annotating and detecting phenotypic information for chronic obstructive pulmonary disease
title_sort annotating and detecting phenotypic information for chronic obstructive pulmonary disease
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6951876/
https://www.ncbi.nlm.nih.gov/pubmed/31984360
http://dx.doi.org/10.1093/jamiaopen/ooz009
work_keys_str_mv AT jumeizhi annotatinganddetectingphenotypicinformationforchronicobstructivepulmonarydisease
AT shortandread annotatinganddetectingphenotypicinformationforchronicobstructivepulmonarydisease
AT thompsonpaul annotatinganddetectingphenotypicinformationforchronicobstructivepulmonarydisease
AT bakerlynawardiar annotatinganddetectingphenotypicinformationforchronicobstructivepulmonarydisease
AT gkoutosgeorgiosv annotatinganddetectingphenotypicinformationforchronicobstructivepulmonarydisease
AT tsaprouniloukia annotatinganddetectingphenotypicinformationforchronicobstructivepulmonarydisease
AT ananiadousophia annotatinganddetectingphenotypicinformationforchronicobstructivepulmonarydisease