Cargando…

Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow

A significant amount of literature is available on biocorrosion, which makes manual extraction of crucial information such as genes and proteins a laborious task. Despite the fast growth of biology related corrosion studies, there is a limited number of gene collections relating to the corrosion pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Thakur, Payal, Alaba, Mathew O., Rauniyar, Shailabh, Singh, Ram Nageena, Saxena, Priya, Bomgni, Alain, Gnimpieba, Etienne Z., Lushbough, Carol, Goh, Kian Mau, Sani, Rajesh Kumar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9867429/
https://www.ncbi.nlm.nih.gov/pubmed/36677411
http://dx.doi.org/10.3390/microorganisms11010119
_version_ 1784876339947372544
author Thakur, Payal
Alaba, Mathew O.
Rauniyar, Shailabh
Singh, Ram Nageena
Saxena, Priya
Bomgni, Alain
Gnimpieba, Etienne Z.
Lushbough, Carol
Goh, Kian Mau
Sani, Rajesh Kumar
author_facet Thakur, Payal
Alaba, Mathew O.
Rauniyar, Shailabh
Singh, Ram Nageena
Saxena, Priya
Bomgni, Alain
Gnimpieba, Etienne Z.
Lushbough, Carol
Goh, Kian Mau
Sani, Rajesh Kumar
author_sort Thakur, Payal
collection PubMed
description A significant amount of literature is available on biocorrosion, which makes manual extraction of crucial information such as genes and proteins a laborious task. Despite the fast growth of biology related corrosion studies, there is a limited number of gene collections relating to the corrosion process (biocorrosion). Text mining offers a potential solution by automatically extracting the essential information from unstructured text. We present a text mining workflow that extracts biocorrosion associated genes/proteins in sulfate-reducing bacteria (SRB) from literature databases (e.g., PubMed and PMC). This semi-automatic workflow is built with the Named Entity Recognition (NER) method and Convolutional Neural Network (CNN) model. With PubMed and PMCID as inputs, the workflow identified 227 genes belonging to several Desulfovibrio species. To validate their functions, Gene Ontology (GO) enrichment and biological network analysis was performed using UniprotKB and STRING-DB, respectively. The GO analysis showed that metal ion binding, sulfur binding, and electron transport were among the principal molecular functions. Furthermore, the biological network analysis generated three interlinked clusters containing genes involved in metal ion binding, cellular respiration, and electron transfer, which suggests the involvement of the extracted gene set in biocorrosion. Finally, the dataset was validated through manual curation, yielding a similar set of genes as our workflow; among these, hysB and hydA, and sat and dsrB were identified as the metal ion binding and sulfur metabolism genes, respectively. The identified genes were mapped with the pangenome of 63 SRB genomes that yielded the distribution of these genes across 63 SRB based on the amino acid sequence similarity and were further categorized as core and accessory gene families. SRB’s role in biocorrosion involves the transfer of electrons from the metal surface via a hydrogen medium to the sulfate reduction pathway. Therefore, genes encoding hydrogenases and cytochromes might be participating in removing hydrogen from the metals through electron transfer. Moreover, the production of corrosive sulfide from the sulfur metabolism indirectly contributes to the localized pitting of the metals. After the corroboration of text mining results with SRB biocorrosion mechanisms, we suggest that the text mining framework could be utilized for genes/proteins extraction and significantly reduce the manual curation time.
format Online
Article
Text
id pubmed-9867429
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-98674292023-01-22 Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow Thakur, Payal Alaba, Mathew O. Rauniyar, Shailabh Singh, Ram Nageena Saxena, Priya Bomgni, Alain Gnimpieba, Etienne Z. Lushbough, Carol Goh, Kian Mau Sani, Rajesh Kumar Microorganisms Article A significant amount of literature is available on biocorrosion, which makes manual extraction of crucial information such as genes and proteins a laborious task. Despite the fast growth of biology related corrosion studies, there is a limited number of gene collections relating to the corrosion process (biocorrosion). Text mining offers a potential solution by automatically extracting the essential information from unstructured text. We present a text mining workflow that extracts biocorrosion associated genes/proteins in sulfate-reducing bacteria (SRB) from literature databases (e.g., PubMed and PMC). This semi-automatic workflow is built with the Named Entity Recognition (NER) method and Convolutional Neural Network (CNN) model. With PubMed and PMCID as inputs, the workflow identified 227 genes belonging to several Desulfovibrio species. To validate their functions, Gene Ontology (GO) enrichment and biological network analysis was performed using UniprotKB and STRING-DB, respectively. The GO analysis showed that metal ion binding, sulfur binding, and electron transport were among the principal molecular functions. Furthermore, the biological network analysis generated three interlinked clusters containing genes involved in metal ion binding, cellular respiration, and electron transfer, which suggests the involvement of the extracted gene set in biocorrosion. Finally, the dataset was validated through manual curation, yielding a similar set of genes as our workflow; among these, hysB and hydA, and sat and dsrB were identified as the metal ion binding and sulfur metabolism genes, respectively. The identified genes were mapped with the pangenome of 63 SRB genomes that yielded the distribution of these genes across 63 SRB based on the amino acid sequence similarity and were further categorized as core and accessory gene families. SRB’s role in biocorrosion involves the transfer of electrons from the metal surface via a hydrogen medium to the sulfate reduction pathway. Therefore, genes encoding hydrogenases and cytochromes might be participating in removing hydrogen from the metals through electron transfer. Moreover, the production of corrosive sulfide from the sulfur metabolism indirectly contributes to the localized pitting of the metals. After the corroboration of text mining results with SRB biocorrosion mechanisms, we suggest that the text mining framework could be utilized for genes/proteins extraction and significantly reduce the manual curation time. MDPI 2023-01-03 /pmc/articles/PMC9867429/ /pubmed/36677411 http://dx.doi.org/10.3390/microorganisms11010119 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Thakur, Payal
Alaba, Mathew O.
Rauniyar, Shailabh
Singh, Ram Nageena
Saxena, Priya
Bomgni, Alain
Gnimpieba, Etienne Z.
Lushbough, Carol
Goh, Kian Mau
Sani, Rajesh Kumar
Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow
title Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow
title_full Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow
title_fullStr Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow
title_full_unstemmed Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow
title_short Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow
title_sort text-mining to identify gene sets involved in biocorrosion by sulfate-reducing bacteria: a semi-automated workflow
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9867429/
https://www.ncbi.nlm.nih.gov/pubmed/36677411
http://dx.doi.org/10.3390/microorganisms11010119
work_keys_str_mv AT thakurpayal textminingtoidentifygenesetsinvolvedinbiocorrosionbysulfatereducingbacteriaasemiautomatedworkflow
AT alabamathewo textminingtoidentifygenesetsinvolvedinbiocorrosionbysulfatereducingbacteriaasemiautomatedworkflow
AT rauniyarshailabh textminingtoidentifygenesetsinvolvedinbiocorrosionbysulfatereducingbacteriaasemiautomatedworkflow
AT singhramnageena textminingtoidentifygenesetsinvolvedinbiocorrosionbysulfatereducingbacteriaasemiautomatedworkflow
AT saxenapriya textminingtoidentifygenesetsinvolvedinbiocorrosionbysulfatereducingbacteriaasemiautomatedworkflow
AT bomgnialain textminingtoidentifygenesetsinvolvedinbiocorrosionbysulfatereducingbacteriaasemiautomatedworkflow
AT gnimpiebaetiennez textminingtoidentifygenesetsinvolvedinbiocorrosionbysulfatereducingbacteriaasemiautomatedworkflow
AT lushboughcarol textminingtoidentifygenesetsinvolvedinbiocorrosionbysulfatereducingbacteriaasemiautomatedworkflow
AT gohkianmau textminingtoidentifygenesetsinvolvedinbiocorrosionbysulfatereducingbacteriaasemiautomatedworkflow
AT sanirajeshkumar textminingtoidentifygenesetsinvolvedinbiocorrosionbysulfatereducingbacteriaasemiautomatedworkflow