Cargando…

Text mining for the biocuration workflow

Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal applicati...

Descripción completa

Detalles Bibliográficos
Autores principales: Hirschman, Lynette, Burns, Gully A. P. C, Krallinger, Martin, Arighi, Cecilia, Cohen, K. Bretonnel, Valencia, Alfonso, Wu, Cathy H., Chatr-Aryamontri, Andrew, Dowell, Karen G., Huala, Eva, Lourenço, Anália, Nash, Robert, Veuthey, Anne-Lise, Wiegers, Thomas, Winter, Andrew G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3328793/
https://www.ncbi.nlm.nih.gov/pubmed/22513129
http://dx.doi.org/10.1093/database/bas020
_version_ 1782229773025542144
author Hirschman, Lynette
Burns, Gully A. P. C
Krallinger, Martin
Arighi, Cecilia
Cohen, K. Bretonnel
Valencia, Alfonso
Wu, Cathy H.
Chatr-Aryamontri, Andrew
Dowell, Karen G.
Huala, Eva
Lourenço, Anália
Nash, Robert
Veuthey, Anne-Lise
Wiegers, Thomas
Winter, Andrew G.
author_facet Hirschman, Lynette
Burns, Gully A. P. C
Krallinger, Martin
Arighi, Cecilia
Cohen, K. Bretonnel
Valencia, Alfonso
Wu, Cathy H.
Chatr-Aryamontri, Andrew
Dowell, Karen G.
Huala, Eva
Lourenço, Anália
Nash, Robert
Veuthey, Anne-Lise
Wiegers, Thomas
Winter, Andrew G.
author_sort Hirschman, Lynette
collection PubMed
description Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.
format Online
Article
Text
id pubmed-3328793
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-33287932012-04-18 Text mining for the biocuration workflow Hirschman, Lynette Burns, Gully A. P. C Krallinger, Martin Arighi, Cecilia Cohen, K. Bretonnel Valencia, Alfonso Wu, Cathy H. Chatr-Aryamontri, Andrew Dowell, Karen G. Huala, Eva Lourenço, Anália Nash, Robert Veuthey, Anne-Lise Wiegers, Thomas Winter, Andrew G. Database (Oxford) Original Article Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community. Oxford University Press 2012-04-18 /pmc/articles/PMC3328793/ /pubmed/22513129 http://dx.doi.org/10.1093/database/bas020 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Hirschman, Lynette
Burns, Gully A. P. C
Krallinger, Martin
Arighi, Cecilia
Cohen, K. Bretonnel
Valencia, Alfonso
Wu, Cathy H.
Chatr-Aryamontri, Andrew
Dowell, Karen G.
Huala, Eva
Lourenço, Anália
Nash, Robert
Veuthey, Anne-Lise
Wiegers, Thomas
Winter, Andrew G.
Text mining for the biocuration workflow
title Text mining for the biocuration workflow
title_full Text mining for the biocuration workflow
title_fullStr Text mining for the biocuration workflow
title_full_unstemmed Text mining for the biocuration workflow
title_short Text mining for the biocuration workflow
title_sort text mining for the biocuration workflow
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3328793/
https://www.ncbi.nlm.nih.gov/pubmed/22513129
http://dx.doi.org/10.1093/database/bas020
work_keys_str_mv AT hirschmanlynette textminingforthebiocurationworkflow
AT burnsgullyapc textminingforthebiocurationworkflow
AT krallingermartin textminingforthebiocurationworkflow
AT arighicecilia textminingforthebiocurationworkflow
AT cohenkbretonnel textminingforthebiocurationworkflow
AT valenciaalfonso textminingforthebiocurationworkflow
AT wucathyh textminingforthebiocurationworkflow
AT chatraryamontriandrew textminingforthebiocurationworkflow
AT dowellkareng textminingforthebiocurationworkflow
AT hualaeva textminingforthebiocurationworkflow
AT lourencoanalia textminingforthebiocurationworkflow
AT nashrobert textminingforthebiocurationworkflow
AT veutheyannelise textminingforthebiocurationworkflow
AT wiegersthomas textminingforthebiocurationworkflow
AT winterandrewg textminingforthebiocurationworkflow