Cargando…

Automated Recognition of Brain Region Mentions in Neuroscience Literature

The ability to computationally extract mentions of neuroanatomical regions from the literature would assist linking to other entities within and outside of an article. Examples include extracting reports of connectivity or region-specific gene expression. To facilitate text mining of neuroscience li...

Descripción completa

Detalles Bibliográficos
Autores principales: French, Leon, Lane, Suzanne, Xu, Lydia, Pavlidis, Paul
Formato: Texto
Lenguaje:English
Publicado: Frontiers Research Foundation 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2741206/
https://www.ncbi.nlm.nih.gov/pubmed/19750194
http://dx.doi.org/10.3389/neuro.11.029.2009
_version_ 1782171781283446784
author French, Leon
Lane, Suzanne
Xu, Lydia
Pavlidis, Paul
author_facet French, Leon
Lane, Suzanne
Xu, Lydia
Pavlidis, Paul
author_sort French, Leon
collection PubMed
description The ability to computationally extract mentions of neuroanatomical regions from the literature would assist linking to other entities within and outside of an article. Examples include extracting reports of connectivity or region-specific gene expression. To facilitate text mining of neuroscience literature we have created a corpus of manually annotated brain region mentions. The corpus contains 1,377 abstracts with 18,242 brain region annotations. Interannotator agreement was evaluated for a subset of the documents, and was 90.7% and 96.7% for strict and lenient matching respectively. We observed a large vocabulary of over 6,000 unique brain region terms and 17,000 words. For automatic extraction of brain region mentions we evaluated simple dictionary methods and complex natural language processing techniques. The dictionary methods based on neuroanatomical lexicons recalled 36% of the mentions with 57% precision. The best performance was achieved using a conditional random field (CRF) with a rich feature set. Features were based on morphological, lexical, syntactic and contextual information. The CRF recalled 76% of mentions at 81% precision, by counting partial matches recall and precision increase to 86% and 92% respectively. We suspect a large amount of error is due to coordinating conjunctions, previously unseen words and brain regions of less commonly studied organisms. We found context windows, lemmatization and abbreviation expansion to be the most informative techniques. The corpus is freely available at http://www.chibi.ubc.ca/WhiteText/.
format Text
id pubmed-2741206
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Frontiers Research Foundation
record_format MEDLINE/PubMed
spelling pubmed-27412062009-09-10 Automated Recognition of Brain Region Mentions in Neuroscience Literature French, Leon Lane, Suzanne Xu, Lydia Pavlidis, Paul Front Neuroinformatics Neuroscience The ability to computationally extract mentions of neuroanatomical regions from the literature would assist linking to other entities within and outside of an article. Examples include extracting reports of connectivity or region-specific gene expression. To facilitate text mining of neuroscience literature we have created a corpus of manually annotated brain region mentions. The corpus contains 1,377 abstracts with 18,242 brain region annotations. Interannotator agreement was evaluated for a subset of the documents, and was 90.7% and 96.7% for strict and lenient matching respectively. We observed a large vocabulary of over 6,000 unique brain region terms and 17,000 words. For automatic extraction of brain region mentions we evaluated simple dictionary methods and complex natural language processing techniques. The dictionary methods based on neuroanatomical lexicons recalled 36% of the mentions with 57% precision. The best performance was achieved using a conditional random field (CRF) with a rich feature set. Features were based on morphological, lexical, syntactic and contextual information. The CRF recalled 76% of mentions at 81% precision, by counting partial matches recall and precision increase to 86% and 92% respectively. We suspect a large amount of error is due to coordinating conjunctions, previously unseen words and brain regions of less commonly studied organisms. We found context windows, lemmatization and abbreviation expansion to be the most informative techniques. The corpus is freely available at http://www.chibi.ubc.ca/WhiteText/. Frontiers Research Foundation 2009-09-01 /pmc/articles/PMC2741206/ /pubmed/19750194 http://dx.doi.org/10.3389/neuro.11.029.2009 Text en Copyright © 2009 French, Lane, Xu and Pavlidis. http://www.frontiersin.org/licenseagreement This is an open-access article subject to an exclusive license agreement between the authors and the Frontiers Research Foundation, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are credited.
spellingShingle Neuroscience
French, Leon
Lane, Suzanne
Xu, Lydia
Pavlidis, Paul
Automated Recognition of Brain Region Mentions in Neuroscience Literature
title Automated Recognition of Brain Region Mentions in Neuroscience Literature
title_full Automated Recognition of Brain Region Mentions in Neuroscience Literature
title_fullStr Automated Recognition of Brain Region Mentions in Neuroscience Literature
title_full_unstemmed Automated Recognition of Brain Region Mentions in Neuroscience Literature
title_short Automated Recognition of Brain Region Mentions in Neuroscience Literature
title_sort automated recognition of brain region mentions in neuroscience literature
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2741206/
https://www.ncbi.nlm.nih.gov/pubmed/19750194
http://dx.doi.org/10.3389/neuro.11.029.2009
work_keys_str_mv AT frenchleon automatedrecognitionofbrainregionmentionsinneuroscienceliterature
AT lanesuzanne automatedrecognitionofbrainregionmentionsinneuroscienceliterature
AT xulydia automatedrecognitionofbrainregionmentionsinneuroscienceliterature
AT pavlidispaul automatedrecognitionofbrainregionmentionsinneuroscienceliterature