Cargando…

Automatic extraction of angiogenesis bioprocess from text

Motivation: Understanding key biological processes (bioprocesses) and their relationships with constituent biological entities and pharmaceutical agents is crucial for drug design and discovery. One way to harvest such information is searching the literature. However, bioprocesses are difficult to c...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Xinglong, McKendrick, Iain, Barrett, Ian, Dix, Ian, French, Tim, Tsujii, Jun'ichi, Ananiadou, Sophia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3179660/
https://www.ncbi.nlm.nih.gov/pubmed/21821664
http://dx.doi.org/10.1093/bioinformatics/btr460
_version_ 1782212539391672320
author Wang, Xinglong
McKendrick, Iain
Barrett, Ian
Dix, Ian
French, Tim
Tsujii, Jun'ichi
Ananiadou, Sophia
author_facet Wang, Xinglong
McKendrick, Iain
Barrett, Ian
Dix, Ian
French, Tim
Tsujii, Jun'ichi
Ananiadou, Sophia
author_sort Wang, Xinglong
collection PubMed
description Motivation: Understanding key biological processes (bioprocesses) and their relationships with constituent biological entities and pharmaceutical agents is crucial for drug design and discovery. One way to harvest such information is searching the literature. However, bioprocesses are difficult to capture because they may occur in text in a variety of textual expressions. Moreover, a bioprocess is often composed of a series of bioevents, where a bioevent denotes changes to one or a group of cells involved in the bioprocess. Such bioevents are often used to refer to bioprocesses in text, which current techniques, relying solely on specialized lexicons, struggle to find. Results: This article presents a range of methods for finding bioprocess terms and events. To facilitate the study, we built a gold standard corpus in which terms and events related to angiogenesis, a key biological process of the growth of new blood vessels, were annotated. Statistics of the annotated corpus revealed that over 36% of the text expressions that referred to angiogenesis appeared as events. The proposed methods respectively employed domain-specific vocabularies, a manually annotated corpus and unstructured domain-specific documents. Evaluation results showed that, while a supervised machine-learning model yielded the best precision, recall and F1 scores, the other methods achieved reasonable performance and less cost to develop. Availability: The angiogenesis vocabularies, gold standard corpus, annotation guidelines and software described in this article are available at http://text0.mib.man.ac.uk/~mbassxw2/angiogenesis/ Contact: xinglong.wang@gmail.com
format Online
Article
Text
id pubmed-3179660
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-31796602011-09-26 Automatic extraction of angiogenesis bioprocess from text Wang, Xinglong McKendrick, Iain Barrett, Ian Dix, Ian French, Tim Tsujii, Jun'ichi Ananiadou, Sophia Bioinformatics Original Papers Motivation: Understanding key biological processes (bioprocesses) and their relationships with constituent biological entities and pharmaceutical agents is crucial for drug design and discovery. One way to harvest such information is searching the literature. However, bioprocesses are difficult to capture because they may occur in text in a variety of textual expressions. Moreover, a bioprocess is often composed of a series of bioevents, where a bioevent denotes changes to one or a group of cells involved in the bioprocess. Such bioevents are often used to refer to bioprocesses in text, which current techniques, relying solely on specialized lexicons, struggle to find. Results: This article presents a range of methods for finding bioprocess terms and events. To facilitate the study, we built a gold standard corpus in which terms and events related to angiogenesis, a key biological process of the growth of new blood vessels, were annotated. Statistics of the annotated corpus revealed that over 36% of the text expressions that referred to angiogenesis appeared as events. The proposed methods respectively employed domain-specific vocabularies, a manually annotated corpus and unstructured domain-specific documents. Evaluation results showed that, while a supervised machine-learning model yielded the best precision, recall and F1 scores, the other methods achieved reasonable performance and less cost to develop. Availability: The angiogenesis vocabularies, gold standard corpus, annotation guidelines and software described in this article are available at http://text0.mib.man.ac.uk/~mbassxw2/angiogenesis/ Contact: xinglong.wang@gmail.com Oxford University Press 2011-10-01 2011-08-05 /pmc/articles/PMC3179660/ /pubmed/21821664 http://dx.doi.org/10.1093/bioinformatics/btr460 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Wang, Xinglong
McKendrick, Iain
Barrett, Ian
Dix, Ian
French, Tim
Tsujii, Jun'ichi
Ananiadou, Sophia
Automatic extraction of angiogenesis bioprocess from text
title Automatic extraction of angiogenesis bioprocess from text
title_full Automatic extraction of angiogenesis bioprocess from text
title_fullStr Automatic extraction of angiogenesis bioprocess from text
title_full_unstemmed Automatic extraction of angiogenesis bioprocess from text
title_short Automatic extraction of angiogenesis bioprocess from text
title_sort automatic extraction of angiogenesis bioprocess from text
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3179660/
https://www.ncbi.nlm.nih.gov/pubmed/21821664
http://dx.doi.org/10.1093/bioinformatics/btr460
work_keys_str_mv AT wangxinglong automaticextractionofangiogenesisbioprocessfromtext
AT mckendrickiain automaticextractionofangiogenesisbioprocessfromtext
AT barrettian automaticextractionofangiogenesisbioprocessfromtext
AT dixian automaticextractionofangiogenesisbioprocessfromtext
AT frenchtim automaticextractionofangiogenesisbioprocessfromtext
AT tsujiijunichi automaticextractionofangiogenesisbioprocessfromtext
AT ananiadousophia automaticextractionofangiogenesisbioprocessfromtext