Cargando…

Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application

We describe the WhiteText project, and its progress towards automatically extracting statements of neuroanatomical connectivity from text. We review progress to date on the three main steps of the project: recognition of brain region mentions, standardization of brain region mentions to neuroanatomi...

Descripción completa

Detalles Bibliográficos
Autores principales: French, Leon, Liu, Po, Marais, Olivia, Koreman, Tianna, Tseng, Lucia, Lai, Artemis, Pavlidis, Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4439553/
https://www.ncbi.nlm.nih.gov/pubmed/26052282
http://dx.doi.org/10.3389/fninf.2015.00013
_version_ 1782372503204659200
author French, Leon
Liu, Po
Marais, Olivia
Koreman, Tianna
Tseng, Lucia
Lai, Artemis
Pavlidis, Paul
author_facet French, Leon
Liu, Po
Marais, Olivia
Koreman, Tianna
Tseng, Lucia
Lai, Artemis
Pavlidis, Paul
author_sort French, Leon
collection PubMed
description We describe the WhiteText project, and its progress towards automatically extracting statements of neuroanatomical connectivity from text. We review progress to date on the three main steps of the project: recognition of brain region mentions, standardization of brain region mentions to neuroanatomical nomenclature, and connectivity statement extraction. We further describe a new version of our manually curated corpus that adds 2,111 connectivity statements from 1,828 additional abstracts. Cross-validation classification within the new corpus replicates results on our original corpus, recalling 67% of connectivity statements at 51% precision. The resulting merged corpus provides 5,208 connectivity statements that can be used to seed species-specific connectivity matrices and to better train automated techniques. Finally, we present a new web application that allows fast interactive browsing of the over 70,000 sentences indexed by the system, as a tool for accessing the data and assisting in further curation. Software and data are freely available at http://www.chibi.ubc.ca/WhiteText/.
format Online
Article
Text
id pubmed-4439553
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-44395532015-06-05 Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application French, Leon Liu, Po Marais, Olivia Koreman, Tianna Tseng, Lucia Lai, Artemis Pavlidis, Paul Front Neuroinform Neuroscience We describe the WhiteText project, and its progress towards automatically extracting statements of neuroanatomical connectivity from text. We review progress to date on the three main steps of the project: recognition of brain region mentions, standardization of brain region mentions to neuroanatomical nomenclature, and connectivity statement extraction. We further describe a new version of our manually curated corpus that adds 2,111 connectivity statements from 1,828 additional abstracts. Cross-validation classification within the new corpus replicates results on our original corpus, recalling 67% of connectivity statements at 51% precision. The resulting merged corpus provides 5,208 connectivity statements that can be used to seed species-specific connectivity matrices and to better train automated techniques. Finally, we present a new web application that allows fast interactive browsing of the over 70,000 sentences indexed by the system, as a tool for accessing the data and assisting in further curation. Software and data are freely available at http://www.chibi.ubc.ca/WhiteText/. Frontiers Media S.A. 2015-05-21 /pmc/articles/PMC4439553/ /pubmed/26052282 http://dx.doi.org/10.3389/fninf.2015.00013 Text en Copyright © 2015 French, Liu, Marais, Koreman, Tseng, Lai and Pavlidis. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
French, Leon
Liu, Po
Marais, Olivia
Koreman, Tianna
Tseng, Lucia
Lai, Artemis
Pavlidis, Paul
Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application
title Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application
title_full Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application
title_fullStr Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application
title_full_unstemmed Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application
title_short Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application
title_sort text mining for neuroanatomy using whitetext with an updated corpus and a new web application
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4439553/
https://www.ncbi.nlm.nih.gov/pubmed/26052282
http://dx.doi.org/10.3389/fninf.2015.00013
work_keys_str_mv AT frenchleon textminingforneuroanatomyusingwhitetextwithanupdatedcorpusandanewwebapplication
AT liupo textminingforneuroanatomyusingwhitetextwithanupdatedcorpusandanewwebapplication
AT maraisolivia textminingforneuroanatomyusingwhitetextwithanupdatedcorpusandanewwebapplication
AT koremantianna textminingforneuroanatomyusingwhitetextwithanupdatedcorpusandanewwebapplication
AT tsenglucia textminingforneuroanatomyusingwhitetextwithanupdatedcorpusandanewwebapplication
AT laiartemis textminingforneuroanatomyusingwhitetextwithanupdatedcorpusandanewwebapplication
AT pavlidispaul textminingforneuroanatomyusingwhitetextwithanupdatedcorpusandanewwebapplication