Cargando…

Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait

BACKGROUND: Scientific literature carries a wealth of information crucial for research, but only a fraction of it is present as structured information in databases and therefore can be analyzed using traditional data analysis tools. Natural language processing (NLP) is often and successfully employe...

Descripción completa

Detalles Bibliográficos
Autores principales: Singh, Gurnoor, Papoutsoglou, Evangelia A., Keijts-Lalleman, Frederique, Vencheva, Bilyana, Rice, Mark, Visser, Richard G.F., Bachem, Christian W.B., Finkers, Richard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8070292/
https://www.ncbi.nlm.nih.gov/pubmed/33894758
http://dx.doi.org/10.1186/s12870-021-02943-5
_version_ 1783683436474531840
author Singh, Gurnoor
Papoutsoglou, Evangelia A.
Keijts-Lalleman, Frederique
Vencheva, Bilyana
Rice, Mark
Visser, Richard G.F.
Bachem, Christian W.B.
Finkers, Richard
author_facet Singh, Gurnoor
Papoutsoglou, Evangelia A.
Keijts-Lalleman, Frederique
Vencheva, Bilyana
Rice, Mark
Visser, Richard G.F.
Bachem, Christian W.B.
Finkers, Richard
author_sort Singh, Gurnoor
collection PubMed
description BACKGROUND: Scientific literature carries a wealth of information crucial for research, but only a fraction of it is present as structured information in databases and therefore can be analyzed using traditional data analysis tools. Natural language processing (NLP) is often and successfully employed to support humans by distilling relevant information from large corpora of free text and structuring it in a way that lends itself to further computational analyses. For this pilot, we developed a pipeline that uses NLP on biological literature to produce knowledge networks. We focused on the flesh color of potato, a well-studied trait with known associations, and we investigated whether these knowledge networks can assist us in formulating new hypotheses on the underlying biological processes. RESULTS: We trained an NLP model based on a manually annotated corpus of 34 full-text potato articles, to recognize relevant biological entities and relationships between them in text (genes, proteins, metabolites and traits). This model detected the number of biological entities with a precision of 97.65% and a recall of 88.91% on the training set. We conducted a time series analysis on 4023 PubMed abstract of plant genetics-based articles which focus on 4 major Solanaceous crops (tomato, potato, eggplant and capsicum), to determine that the networks contained both previously known and contemporaneously unknown leads to subsequently discovered biological phenomena relating to flesh color. A novel time-based analysis of these networks indicates a connection between our trait and a candidate gene (zeaxanthin epoxidase) already two years prior to explicit statements of that connection in the literature. CONCLUSIONS: Our time-based analysis indicates that network-assisted hypothesis generation shows promise for knowledge discovery, data integration and hypothesis generation in scientific research. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12870-021-02943-5).
format Online
Article
Text
id pubmed-8070292
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-80702922021-04-26 Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait Singh, Gurnoor Papoutsoglou, Evangelia A. Keijts-Lalleman, Frederique Vencheva, Bilyana Rice, Mark Visser, Richard G.F. Bachem, Christian W.B. Finkers, Richard BMC Plant Biol Research Article BACKGROUND: Scientific literature carries a wealth of information crucial for research, but only a fraction of it is present as structured information in databases and therefore can be analyzed using traditional data analysis tools. Natural language processing (NLP) is often and successfully employed to support humans by distilling relevant information from large corpora of free text and structuring it in a way that lends itself to further computational analyses. For this pilot, we developed a pipeline that uses NLP on biological literature to produce knowledge networks. We focused on the flesh color of potato, a well-studied trait with known associations, and we investigated whether these knowledge networks can assist us in formulating new hypotheses on the underlying biological processes. RESULTS: We trained an NLP model based on a manually annotated corpus of 34 full-text potato articles, to recognize relevant biological entities and relationships between them in text (genes, proteins, metabolites and traits). This model detected the number of biological entities with a precision of 97.65% and a recall of 88.91% on the training set. We conducted a time series analysis on 4023 PubMed abstract of plant genetics-based articles which focus on 4 major Solanaceous crops (tomato, potato, eggplant and capsicum), to determine that the networks contained both previously known and contemporaneously unknown leads to subsequently discovered biological phenomena relating to flesh color. A novel time-based analysis of these networks indicates a connection between our trait and a candidate gene (zeaxanthin epoxidase) already two years prior to explicit statements of that connection in the literature. CONCLUSIONS: Our time-based analysis indicates that network-assisted hypothesis generation shows promise for knowledge discovery, data integration and hypothesis generation in scientific research. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12870-021-02943-5). BioMed Central 2021-04-24 /pmc/articles/PMC8070292/ /pubmed/33894758 http://dx.doi.org/10.1186/s12870-021-02943-5 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Singh, Gurnoor
Papoutsoglou, Evangelia A.
Keijts-Lalleman, Frederique
Vencheva, Bilyana
Rice, Mark
Visser, Richard G.F.
Bachem, Christian W.B.
Finkers, Richard
Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait
title Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait
title_full Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait
title_fullStr Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait
title_full_unstemmed Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait
title_short Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait
title_sort extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8070292/
https://www.ncbi.nlm.nih.gov/pubmed/33894758
http://dx.doi.org/10.1186/s12870-021-02943-5
work_keys_str_mv AT singhgurnoor extractingknowledgenetworksfromplantscientificliteraturepotatotuberfleshcolorasanexemplarytrait
AT papoutsoglouevangeliaa extractingknowledgenetworksfromplantscientificliteraturepotatotuberfleshcolorasanexemplarytrait
AT keijtslallemanfrederique extractingknowledgenetworksfromplantscientificliteraturepotatotuberfleshcolorasanexemplarytrait
AT venchevabilyana extractingknowledgenetworksfromplantscientificliteraturepotatotuberfleshcolorasanexemplarytrait
AT ricemark extractingknowledgenetworksfromplantscientificliteraturepotatotuberfleshcolorasanexemplarytrait
AT visserrichardgf extractingknowledgenetworksfromplantscientificliteraturepotatotuberfleshcolorasanexemplarytrait
AT bachemchristianwb extractingknowledgenetworksfromplantscientificliteraturepotatotuberfleshcolorasanexemplarytrait
AT finkersrichard extractingknowledgenetworksfromplantscientificliteraturepotatotuberfleshcolorasanexemplarytrait