Cargando…

Automatic reconstruction of a bacterial regulatory network using Natural Language Processing

BACKGROUND: Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory inte...

Descripción completa

Detalles Bibliográficos
Autores principales: Rodríguez-Penagos, Carlos, Salgado, Heladia, Martínez-Flores, Irma, Collado-Vides, Julio
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1964768/
https://www.ncbi.nlm.nih.gov/pubmed/17683642
http://dx.doi.org/10.1186/1471-2105-8-293
_version_ 1782134655719309312
author Rodríguez-Penagos, Carlos
Salgado, Heladia
Martínez-Flores, Irma
Collado-Vides, Julio
author_facet Rodríguez-Penagos, Carlos
Salgado, Heladia
Martínez-Flores, Irma
Collado-Vides, Julio
author_sort Rodríguez-Penagos, Carlos
collection PubMed
description BACKGROUND: Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers. Our major aim is to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases. We implemented a rule-based system to generate networks from different sets of documents dealing with regulation in Escherichia coli K-12. RESULTS: Performance evaluation is based on the most comprehensive transcriptional regulation database for any organism, the manually-curated RegulonDB, 45% of which we were able to recreate automatically. From our automated analysis we were also able to find some new interactions from papers not already curated, or that were missed in the manual filtering and review of the literature. We also put forward a novel Regulatory Interaction Markup Language better suited than SBML for simultaneously representing data of interest for biologists and text miners. CONCLUSION: Manual curation of the output of automatic processing of text is a good way to complement a more detailed review of the literature, either for validating the results of what has been already annotated, or for discovering facts and information that might have been overlooked at the triage or curation stages.
format Text
id pubmed-1964768
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-19647682007-09-05 Automatic reconstruction of a bacterial regulatory network using Natural Language Processing Rodríguez-Penagos, Carlos Salgado, Heladia Martínez-Flores, Irma Collado-Vides, Julio BMC Bioinformatics Methodology Article BACKGROUND: Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers. Our major aim is to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases. We implemented a rule-based system to generate networks from different sets of documents dealing with regulation in Escherichia coli K-12. RESULTS: Performance evaluation is based on the most comprehensive transcriptional regulation database for any organism, the manually-curated RegulonDB, 45% of which we were able to recreate automatically. From our automated analysis we were also able to find some new interactions from papers not already curated, or that were missed in the manual filtering and review of the literature. We also put forward a novel Regulatory Interaction Markup Language better suited than SBML for simultaneously representing data of interest for biologists and text miners. CONCLUSION: Manual curation of the output of automatic processing of text is a good way to complement a more detailed review of the literature, either for validating the results of what has been already annotated, or for discovering facts and information that might have been overlooked at the triage or curation stages. BioMed Central 2007-08-07 /pmc/articles/PMC1964768/ /pubmed/17683642 http://dx.doi.org/10.1186/1471-2105-8-293 Text en Copyright © 2007 Rodríguez-Penagos et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Rodríguez-Penagos, Carlos
Salgado, Heladia
Martínez-Flores, Irma
Collado-Vides, Julio
Automatic reconstruction of a bacterial regulatory network using Natural Language Processing
title Automatic reconstruction of a bacterial regulatory network using Natural Language Processing
title_full Automatic reconstruction of a bacterial regulatory network using Natural Language Processing
title_fullStr Automatic reconstruction of a bacterial regulatory network using Natural Language Processing
title_full_unstemmed Automatic reconstruction of a bacterial regulatory network using Natural Language Processing
title_short Automatic reconstruction of a bacterial regulatory network using Natural Language Processing
title_sort automatic reconstruction of a bacterial regulatory network using natural language processing
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1964768/
https://www.ncbi.nlm.nih.gov/pubmed/17683642
http://dx.doi.org/10.1186/1471-2105-8-293
work_keys_str_mv AT rodriguezpenagoscarlos automaticreconstructionofabacterialregulatorynetworkusingnaturallanguageprocessing
AT salgadoheladia automaticreconstructionofabacterialregulatorynetworkusingnaturallanguageprocessing
AT martinezfloresirma automaticreconstructionofabacterialregulatorynetworkusingnaturallanguageprocessing
AT colladovidesjulio automaticreconstructionofabacterialregulatorynetworkusingnaturallanguageprocessing