Cargando…

Automatic extraction, prioritization and analysis of gut microbial metabolites from biomedical literature

Many diseases are driven by gene-environment interactions. One important environmental factor is the metabolic output of human gut microbiota. A comprehensive catalog of human metabolites originated in microbes is critical for data-driven approaches to understand how microbial metabolism contributes...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, QuanQiu, Xu, Rong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7305201/
https://www.ncbi.nlm.nih.gov/pubmed/32561832
http://dx.doi.org/10.1038/s41598-020-67075-6
_version_ 1783548410095206400
author Wang, QuanQiu
Xu, Rong
author_facet Wang, QuanQiu
Xu, Rong
author_sort Wang, QuanQiu
collection PubMed
description Many diseases are driven by gene-environment interactions. One important environmental factor is the metabolic output of human gut microbiota. A comprehensive catalog of human metabolites originated in microbes is critical for data-driven approaches to understand how microbial metabolism contributes to human health and diseases. Here we present a novel integrated approach to automatically extract and analyze microbial metabolites from 28 million published biomedical records. First, we classified 28,851,232 MEDLINE records into microbial metabolism-related or not. Second, candidate microbial metabolites were extracted from the classified texts. Third, we developed signal prioritization algorithms to further differentiate microbial metabolites from metabolites originated from other resources. Finally, we systematically analyzed the interactions between extracted microbial metabolites and human genes. A total of 11,846 metabolites were extracted from 28 million MEDLINE articles. The combined text classification and signal prioritization significantly enriched true positives among top: manual curation of top 100 metabolites showed a true precision of 0.55, representing a significant 38.3-fold enrichment as compared to the precision of 0.014 for baseline extraction. More importantly, 29% extracted microbial metabolites have not been captured by existing databases. We performed data-driven analysis of the interactions between the extracted microbial metabolite and human genetics. This study represents the first effort towards automatically extracting and prioritizing microbial metabolites from published biomedical literature, which can set a foundation for future tasks of microbial metabolite relationship extraction from literature and facilitate data-driven studies of how microbial metabolism contributes to human diseases.
format Online
Article
Text
id pubmed-7305201
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-73052012020-06-23 Automatic extraction, prioritization and analysis of gut microbial metabolites from biomedical literature Wang, QuanQiu Xu, Rong Sci Rep Article Many diseases are driven by gene-environment interactions. One important environmental factor is the metabolic output of human gut microbiota. A comprehensive catalog of human metabolites originated in microbes is critical for data-driven approaches to understand how microbial metabolism contributes to human health and diseases. Here we present a novel integrated approach to automatically extract and analyze microbial metabolites from 28 million published biomedical records. First, we classified 28,851,232 MEDLINE records into microbial metabolism-related or not. Second, candidate microbial metabolites were extracted from the classified texts. Third, we developed signal prioritization algorithms to further differentiate microbial metabolites from metabolites originated from other resources. Finally, we systematically analyzed the interactions between extracted microbial metabolites and human genes. A total of 11,846 metabolites were extracted from 28 million MEDLINE articles. The combined text classification and signal prioritization significantly enriched true positives among top: manual curation of top 100 metabolites showed a true precision of 0.55, representing a significant 38.3-fold enrichment as compared to the precision of 0.014 for baseline extraction. More importantly, 29% extracted microbial metabolites have not been captured by existing databases. We performed data-driven analysis of the interactions between the extracted microbial metabolite and human genetics. This study represents the first effort towards automatically extracting and prioritizing microbial metabolites from published biomedical literature, which can set a foundation for future tasks of microbial metabolite relationship extraction from literature and facilitate data-driven studies of how microbial metabolism contributes to human diseases. Nature Publishing Group UK 2020-06-19 /pmc/articles/PMC7305201/ /pubmed/32561832 http://dx.doi.org/10.1038/s41598-020-67075-6 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Wang, QuanQiu
Xu, Rong
Automatic extraction, prioritization and analysis of gut microbial metabolites from biomedical literature
title Automatic extraction, prioritization and analysis of gut microbial metabolites from biomedical literature
title_full Automatic extraction, prioritization and analysis of gut microbial metabolites from biomedical literature
title_fullStr Automatic extraction, prioritization and analysis of gut microbial metabolites from biomedical literature
title_full_unstemmed Automatic extraction, prioritization and analysis of gut microbial metabolites from biomedical literature
title_short Automatic extraction, prioritization and analysis of gut microbial metabolites from biomedical literature
title_sort automatic extraction, prioritization and analysis of gut microbial metabolites from biomedical literature
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7305201/
https://www.ncbi.nlm.nih.gov/pubmed/32561832
http://dx.doi.org/10.1038/s41598-020-67075-6
work_keys_str_mv AT wangquanqiu automaticextractionprioritizationandanalysisofgutmicrobialmetabolitesfrombiomedicalliterature
AT xurong automaticextractionprioritizationandanalysisofgutmicrobialmetabolitesfrombiomedicalliterature