Cargando…

LAITOR4HPC: A text mining pipeline based on HPC for building interaction networks

BACKGROUND: The amount of published full-text articles has increased dramatically. Text mining tools configure an essential approach to building biological networks, updating databases and providing annotation for new pathways. PESCADOR is an online web server based on LAITOR and NLProt text mining...

Descripción completa

Detalles Bibliográficos
Autores principales: Piereck, Bruna, Oliveira-Lima, Marx, Benko-Iseppon, Ana Maria, Diehl, Sarah, Schneider, Reinhard, Brasileiro-Vidal, Ana Christina, Barbosa-Silva, Adriano
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7447576/
https://www.ncbi.nlm.nih.gov/pubmed/32838742
http://dx.doi.org/10.1186/s12859-020-03620-4
_version_ 1783574328444452864
author Piereck, Bruna
Oliveira-Lima, Marx
Benko-Iseppon, Ana Maria
Diehl, Sarah
Schneider, Reinhard
Brasileiro-Vidal, Ana Christina
Barbosa-Silva, Adriano
author_facet Piereck, Bruna
Oliveira-Lima, Marx
Benko-Iseppon, Ana Maria
Diehl, Sarah
Schneider, Reinhard
Brasileiro-Vidal, Ana Christina
Barbosa-Silva, Adriano
author_sort Piereck, Bruna
collection PubMed
description BACKGROUND: The amount of published full-text articles has increased dramatically. Text mining tools configure an essential approach to building biological networks, updating databases and providing annotation for new pathways. PESCADOR is an online web server based on LAITOR and NLProt text mining tools, which retrieves protein-protein co-occurrences in a tabular-based format, adding a network schema. Here we present an HPC-oriented version of PESCADOR’s native text mining tool, renamed to LAITOR4HPC, aiming to access an unlimited abstract amount in a short time to enrich available networks, build new ones and possibly highlight whether fields of research have been exhaustively studied. RESULTS: By taking advantage of parallel computing HPC infrastructure, the full collection of MEDLINE abstracts available until June 2017 was analyzed in a shorter period (6 days) when compared to the original online implementation (with an estimated 2 years to run the same data). Additionally, three case studies were presented to illustrate LAITOR4HPC usage possibilities. The first case study targeted soybean and was used to retrieve an overview of published co-occurrences in a single organism, retrieving 15,788 proteins in 7894 co-occurrences. In the second case study, a target gene family was searched in many organisms, by analyzing 15 species under biotic stress. Most co-occurrences regarded Arabidopsis thaliana and Zea mays. The third case study concerned the construction and enrichment of an available pathway. Choosing A. thaliana for further analysis, the defensin pathway was enriched, showing additional signaling and regulation molecules, and how they respond to each other in the modulation of this complex plant defense response. CONCLUSIONS: LAITOR4HPC can be used for an efficient text mining based construction of biological networks derived from big data sources, such as MEDLINE abstracts. Time consumption and data input limitations will depend on the available resources at the HPC facility. LAITOR4HPC enables enough flexibility for different approaches and data amounts targeted to an organism, a subject, or a specific pathway. Additionally, it can deliver comprehensive results where interactions are classified into four types, according to their reliability.
format Online
Article
Text
id pubmed-7447576
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-74475762020-08-27 LAITOR4HPC: A text mining pipeline based on HPC for building interaction networks Piereck, Bruna Oliveira-Lima, Marx Benko-Iseppon, Ana Maria Diehl, Sarah Schneider, Reinhard Brasileiro-Vidal, Ana Christina Barbosa-Silva, Adriano BMC Bioinformatics Software BACKGROUND: The amount of published full-text articles has increased dramatically. Text mining tools configure an essential approach to building biological networks, updating databases and providing annotation for new pathways. PESCADOR is an online web server based on LAITOR and NLProt text mining tools, which retrieves protein-protein co-occurrences in a tabular-based format, adding a network schema. Here we present an HPC-oriented version of PESCADOR’s native text mining tool, renamed to LAITOR4HPC, aiming to access an unlimited abstract amount in a short time to enrich available networks, build new ones and possibly highlight whether fields of research have been exhaustively studied. RESULTS: By taking advantage of parallel computing HPC infrastructure, the full collection of MEDLINE abstracts available until June 2017 was analyzed in a shorter period (6 days) when compared to the original online implementation (with an estimated 2 years to run the same data). Additionally, three case studies were presented to illustrate LAITOR4HPC usage possibilities. The first case study targeted soybean and was used to retrieve an overview of published co-occurrences in a single organism, retrieving 15,788 proteins in 7894 co-occurrences. In the second case study, a target gene family was searched in many organisms, by analyzing 15 species under biotic stress. Most co-occurrences regarded Arabidopsis thaliana and Zea mays. The third case study concerned the construction and enrichment of an available pathway. Choosing A. thaliana for further analysis, the defensin pathway was enriched, showing additional signaling and regulation molecules, and how they respond to each other in the modulation of this complex plant defense response. CONCLUSIONS: LAITOR4HPC can be used for an efficient text mining based construction of biological networks derived from big data sources, such as MEDLINE abstracts. Time consumption and data input limitations will depend on the available resources at the HPC facility. LAITOR4HPC enables enough flexibility for different approaches and data amounts targeted to an organism, a subject, or a specific pathway. Additionally, it can deliver comprehensive results where interactions are classified into four types, according to their reliability. BioMed Central 2020-08-24 /pmc/articles/PMC7447576/ /pubmed/32838742 http://dx.doi.org/10.1186/s12859-020-03620-4 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Piereck, Bruna
Oliveira-Lima, Marx
Benko-Iseppon, Ana Maria
Diehl, Sarah
Schneider, Reinhard
Brasileiro-Vidal, Ana Christina
Barbosa-Silva, Adriano
LAITOR4HPC: A text mining pipeline based on HPC for building interaction networks
title LAITOR4HPC: A text mining pipeline based on HPC for building interaction networks
title_full LAITOR4HPC: A text mining pipeline based on HPC for building interaction networks
title_fullStr LAITOR4HPC: A text mining pipeline based on HPC for building interaction networks
title_full_unstemmed LAITOR4HPC: A text mining pipeline based on HPC for building interaction networks
title_short LAITOR4HPC: A text mining pipeline based on HPC for building interaction networks
title_sort laitor4hpc: a text mining pipeline based on hpc for building interaction networks
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7447576/
https://www.ncbi.nlm.nih.gov/pubmed/32838742
http://dx.doi.org/10.1186/s12859-020-03620-4
work_keys_str_mv AT piereckbruna laitor4hpcatextminingpipelinebasedonhpcforbuildinginteractionnetworks
AT oliveiralimamarx laitor4hpcatextminingpipelinebasedonhpcforbuildinginteractionnetworks
AT benkoisepponanamaria laitor4hpcatextminingpipelinebasedonhpcforbuildinginteractionnetworks
AT diehlsarah laitor4hpcatextminingpipelinebasedonhpcforbuildinginteractionnetworks
AT schneiderreinhard laitor4hpcatextminingpipelinebasedonhpcforbuildinginteractionnetworks
AT brasileirovidalanachristina laitor4hpcatextminingpipelinebasedonhpcforbuildinginteractionnetworks
AT barbosasilvaadriano laitor4hpcatextminingpipelinebasedonhpcforbuildinginteractionnetworks