Cargando…

Semantic annotation of biological concepts interplaying microbial cellular responses

BACKGROUND: Automated extraction systems have become a time saving necessity in Systems Biology. Considerable human effort is needed to model, analyse and simulate biological networks. Thus, one of the challenges posed to Biomedical Text Mining tools is that of learning to recognise a wide variety o...

Descripción completa

Detalles Bibliográficos
Autores principales: Carreira, Rafael, Carneiro, Sónia, Pereira, Rui, Rocha, Miguel, Rocha, Isabel, Ferreira, Eugénio C, Lourenço, Anália
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3259143/
https://www.ncbi.nlm.nih.gov/pubmed/22122862
http://dx.doi.org/10.1186/1471-2105-12-460
_version_ 1782221353265397760
author Carreira, Rafael
Carneiro, Sónia
Pereira, Rui
Rocha, Miguel
Rocha, Isabel
Ferreira, Eugénio C
Lourenço, Anália
author_facet Carreira, Rafael
Carneiro, Sónia
Pereira, Rui
Rocha, Miguel
Rocha, Isabel
Ferreira, Eugénio C
Lourenço, Anália
author_sort Carreira, Rafael
collection PubMed
description BACKGROUND: Automated extraction systems have become a time saving necessity in Systems Biology. Considerable human effort is needed to model, analyse and simulate biological networks. Thus, one of the challenges posed to Biomedical Text Mining tools is that of learning to recognise a wide variety of biological concepts with different functional roles to assist in these processes. RESULTS: Here, we present a novel corpus concerning the integrated cellular responses to nutrient starvation in the model-organism Escherichia coli. Our corpus is a unique resource in that it annotates biomedical concepts that play a functional role in expression, regulation and metabolism. Namely, it includes annotations for genetic information carriers (genes and DNA, RNA molecules), proteins (transcription factors, enzymes and transporters), small metabolites, physiological states and laboratory techniques. The corpus consists of 130 full-text papers with a total of 59043 annotations for 3649 different biomedical concepts; the two dominant classes are genes (highest number of unique concepts) and compounds (most frequently annotated concepts), whereas other important cellular concepts such as proteins account for no more than 10% of the annotated concepts. CONCLUSIONS: To the best of our knowledge, a corpus that details such a wide range of biological concepts has never been presented to the text mining community. The inter-annotator agreement statistics provide evidence of the importance of a consolidated background when dealing with such complex descriptions, the ambiguities naturally arising from the terminology and their impact for modelling purposes. Availability is granted for the full-text corpora of 130 freely accessible documents, the annotation scheme and the annotation guidelines. Also, we include a corpus of 340 abstracts.
format Online
Article
Text
id pubmed-3259143
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32591432012-01-18 Semantic annotation of biological concepts interplaying microbial cellular responses Carreira, Rafael Carneiro, Sónia Pereira, Rui Rocha, Miguel Rocha, Isabel Ferreira, Eugénio C Lourenço, Anália BMC Bioinformatics Research Article BACKGROUND: Automated extraction systems have become a time saving necessity in Systems Biology. Considerable human effort is needed to model, analyse and simulate biological networks. Thus, one of the challenges posed to Biomedical Text Mining tools is that of learning to recognise a wide variety of biological concepts with different functional roles to assist in these processes. RESULTS: Here, we present a novel corpus concerning the integrated cellular responses to nutrient starvation in the model-organism Escherichia coli. Our corpus is a unique resource in that it annotates biomedical concepts that play a functional role in expression, regulation and metabolism. Namely, it includes annotations for genetic information carriers (genes and DNA, RNA molecules), proteins (transcription factors, enzymes and transporters), small metabolites, physiological states and laboratory techniques. The corpus consists of 130 full-text papers with a total of 59043 annotations for 3649 different biomedical concepts; the two dominant classes are genes (highest number of unique concepts) and compounds (most frequently annotated concepts), whereas other important cellular concepts such as proteins account for no more than 10% of the annotated concepts. CONCLUSIONS: To the best of our knowledge, a corpus that details such a wide range of biological concepts has never been presented to the text mining community. The inter-annotator agreement statistics provide evidence of the importance of a consolidated background when dealing with such complex descriptions, the ambiguities naturally arising from the terminology and their impact for modelling purposes. Availability is granted for the full-text corpora of 130 freely accessible documents, the annotation scheme and the annotation guidelines. Also, we include a corpus of 340 abstracts. BioMed Central 2011-11-28 /pmc/articles/PMC3259143/ /pubmed/22122862 http://dx.doi.org/10.1186/1471-2105-12-460 Text en Copyright ©2011 Carreira et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Carreira, Rafael
Carneiro, Sónia
Pereira, Rui
Rocha, Miguel
Rocha, Isabel
Ferreira, Eugénio C
Lourenço, Anália
Semantic annotation of biological concepts interplaying microbial cellular responses
title Semantic annotation of biological concepts interplaying microbial cellular responses
title_full Semantic annotation of biological concepts interplaying microbial cellular responses
title_fullStr Semantic annotation of biological concepts interplaying microbial cellular responses
title_full_unstemmed Semantic annotation of biological concepts interplaying microbial cellular responses
title_short Semantic annotation of biological concepts interplaying microbial cellular responses
title_sort semantic annotation of biological concepts interplaying microbial cellular responses
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3259143/
https://www.ncbi.nlm.nih.gov/pubmed/22122862
http://dx.doi.org/10.1186/1471-2105-12-460
work_keys_str_mv AT carreirarafael semanticannotationofbiologicalconceptsinterplayingmicrobialcellularresponses
AT carneirosonia semanticannotationofbiologicalconceptsinterplayingmicrobialcellularresponses
AT pereirarui semanticannotationofbiologicalconceptsinterplayingmicrobialcellularresponses
AT rochamiguel semanticannotationofbiologicalconceptsinterplayingmicrobialcellularresponses
AT rochaisabel semanticannotationofbiologicalconceptsinterplayingmicrobialcellularresponses
AT ferreiraeugenioc semanticannotationofbiologicalconceptsinterplayingmicrobialcellularresponses
AT lourencoanalia semanticannotationofbiologicalconceptsinterplayingmicrobialcellularresponses