Cargando…

Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties

Transcription factors (TFs) play a main role in transcriptional regulation of bacteria, as they regulate transcription of the genetic information encoded in DNA. Thus, the curation of the properties of these regulatory proteins is essential for a better understanding of transcriptional regulation. H...

Descripción completa

Detalles Bibliográficos
Autores principales: Méndez-Cruz, Carlos-Francisco, Blanchet, Antonio, Godínez, Alan, Arroyo-Fernández, Ignacio, Gama-Castro, Socorro, Martínez-Luna, Sara Berenice, González-Colín, Cristian, Collado-Vides, Julio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7731926/
https://www.ncbi.nlm.nih.gov/pubmed/33306798
http://dx.doi.org/10.1093/database/baaa109
_version_ 1783621995255037952
author Méndez-Cruz, Carlos-Francisco
Blanchet, Antonio
Godínez, Alan
Arroyo-Fernández, Ignacio
Gama-Castro, Socorro
Martínez-Luna, Sara Berenice
González-Colín, Cristian
Collado-Vides, Julio
author_facet Méndez-Cruz, Carlos-Francisco
Blanchet, Antonio
Godínez, Alan
Arroyo-Fernández, Ignacio
Gama-Castro, Socorro
Martínez-Luna, Sara Berenice
González-Colín, Cristian
Collado-Vides, Julio
author_sort Méndez-Cruz, Carlos-Francisco
collection PubMed
description Transcription factors (TFs) play a main role in transcriptional regulation of bacteria, as they regulate transcription of the genetic information encoded in DNA. Thus, the curation of the properties of these regulatory proteins is essential for a better understanding of transcriptional regulation. However, traditional manual curation of article collections to compile descriptions of TF properties takes significant time and effort due to the overwhelming amount of biomedical literature, which increases every day. The development of automatic approaches for knowledge extraction to assist curation is therefore critical. Here, we show an effective approach for knowledge extraction to assist curation of summaries describing bacterial TF properties based on an automatic text summarization strategy. We were able to recover automatically a median 77% of the knowledge contained in manual summaries describing properties of 177 TFs of Escherichia coli K-12 by processing 5961 scientific articles. For 71% of the TFs, our approach extracted new knowledge that can be used to expand manual descriptions. Furthermore, as we trained our predictive model with manual summaries of E. coli, we also generated summaries for 185 TFs of Salmonella enterica serovar Typhimurium from 3498 articles. According to the manual curation of 10 of these Salmonella typhimurium summaries, 96% of their sentences contained relevant knowledge. Our results demonstrate the feasibility to assist manual curation to expand manual summaries with new knowledge automatically extracted and to create new summaries of bacteria for which these curation efforts do not exist. Database URL: The automatic summaries of the TFs of E. coli and Salmonella and the automatic summarizer are available in GitHub (https://github.com/laigen-unam/tf-properties-summarizer.git).
format Online
Article
Text
id pubmed-7731926
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77319262020-12-16 Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties Méndez-Cruz, Carlos-Francisco Blanchet, Antonio Godínez, Alan Arroyo-Fernández, Ignacio Gama-Castro, Socorro Martínez-Luna, Sara Berenice González-Colín, Cristian Collado-Vides, Julio Database (Oxford) Original Article Transcription factors (TFs) play a main role in transcriptional regulation of bacteria, as they regulate transcription of the genetic information encoded in DNA. Thus, the curation of the properties of these regulatory proteins is essential for a better understanding of transcriptional regulation. However, traditional manual curation of article collections to compile descriptions of TF properties takes significant time and effort due to the overwhelming amount of biomedical literature, which increases every day. The development of automatic approaches for knowledge extraction to assist curation is therefore critical. Here, we show an effective approach for knowledge extraction to assist curation of summaries describing bacterial TF properties based on an automatic text summarization strategy. We were able to recover automatically a median 77% of the knowledge contained in manual summaries describing properties of 177 TFs of Escherichia coli K-12 by processing 5961 scientific articles. For 71% of the TFs, our approach extracted new knowledge that can be used to expand manual descriptions. Furthermore, as we trained our predictive model with manual summaries of E. coli, we also generated summaries for 185 TFs of Salmonella enterica serovar Typhimurium from 3498 articles. According to the manual curation of 10 of these Salmonella typhimurium summaries, 96% of their sentences contained relevant knowledge. Our results demonstrate the feasibility to assist manual curation to expand manual summaries with new knowledge automatically extracted and to create new summaries of bacteria for which these curation efforts do not exist. Database URL: The automatic summaries of the TFs of E. coli and Salmonella and the automatic summarizer are available in GitHub (https://github.com/laigen-unam/tf-properties-summarizer.git). Oxford University Press 2020-12-11 /pmc/articles/PMC7731926/ /pubmed/33306798 http://dx.doi.org/10.1093/database/baaa109 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Méndez-Cruz, Carlos-Francisco
Blanchet, Antonio
Godínez, Alan
Arroyo-Fernández, Ignacio
Gama-Castro, Socorro
Martínez-Luna, Sara Berenice
González-Colín, Cristian
Collado-Vides, Julio
Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties
title Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties
title_full Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties
title_fullStr Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties
title_full_unstemmed Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties
title_short Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties
title_sort knowledge extraction for assisted curation of summaries of bacterial transcription factor properties
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7731926/
https://www.ncbi.nlm.nih.gov/pubmed/33306798
http://dx.doi.org/10.1093/database/baaa109
work_keys_str_mv AT mendezcruzcarlosfrancisco knowledgeextractionforassistedcurationofsummariesofbacterialtranscriptionfactorproperties
AT blanchetantonio knowledgeextractionforassistedcurationofsummariesofbacterialtranscriptionfactorproperties
AT godinezalan knowledgeextractionforassistedcurationofsummariesofbacterialtranscriptionfactorproperties
AT arroyofernandezignacio knowledgeextractionforassistedcurationofsummariesofbacterialtranscriptionfactorproperties
AT gamacastrosocorro knowledgeextractionforassistedcurationofsummariesofbacterialtranscriptionfactorproperties
AT martinezlunasaraberenice knowledgeextractionforassistedcurationofsummariesofbacterialtranscriptionfactorproperties
AT gonzalezcolincristian knowledgeextractionforassistedcurationofsummariesofbacterialtranscriptionfactorproperties
AT colladovidesjulio knowledgeextractionforassistedcurationofsummariesofbacterialtranscriptionfactorproperties