Cargando…

Protein–Protein Interaction Network Extraction Using Text Mining Methods Adds Insight into Autism Spectrum Disorder

SIMPLE SUMMARY: Research on proteins and their interactions with other proteins yields many new findings that help explain how diseases emerge. However, manual curation of scientific literature delays new discoveries in the field. Artificial intelligence and deep learning techniques have played a si...

Descripción completa

Detalles Bibliográficos
Autores principales: Nezamuldeen, Leena, Jafri, Mohsin Saleet
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10604135/
https://www.ncbi.nlm.nih.gov/pubmed/37887054
http://dx.doi.org/10.3390/biology12101344
_version_ 1785126764065849344
author Nezamuldeen, Leena
Jafri, Mohsin Saleet
author_facet Nezamuldeen, Leena
Jafri, Mohsin Saleet
author_sort Nezamuldeen, Leena
collection PubMed
description SIMPLE SUMMARY: Research on proteins and their interactions with other proteins yields many new findings that help explain how diseases emerge. However, manual curation of scientific literature delays new discoveries in the field. Artificial intelligence and deep learning techniques have played a significant part in information extraction from textual forms. In this study, we used text mining and artificial intelligence techniques to address the issue of extracting protein–protein interaction networks from the vast amount of scientific research literature. We have created an automated system consisting of three models using deep learning and natural language processing methods. The accuracy of our first model, which employs recurrent neural networks using sentiment analysis, was 95%. Additionally, the accuracy of our second model, which employs the named entity recognition technique in NLP, was effective and achieved an accuracy of 98%. In comparison to the protein interaction network, we discovered by manual curation of more than 30 articles on Autism Spectrum Disorder, that the automated system testing on 6027 abstracts was successful in developing the network of interactions and provided an improved view. Discovering these networks will greatly help physicians and scientists understand how these molecules interact for physiological, pharmacological, and pathological insight. ABSTRACT: Text mining methods are being developed to assimilate the volume of biomedical textual materials that are continually expanding. Understanding protein–protein interaction (PPI) deficits would assist in explaining the genesis of diseases. In this study, we designed an automated system to extract PPIs from the biomedical literature that uses a deep learning sentence classification model, a pretrained word embedding, and a BiLSTM recurrent neural network with additional layers, a conditional random field (CRF) named entity recognition (NER) model, and shortest-dependency path (SDP) model using the SpaCy library in Python. The automated system ensures that it targets sentences that contain PPIs and not just these proteins mentioned in the framework of disease discovery or other context. Our first model achieved 13% greater precision on the Aimed/BioInfr benchmark corpus than the previous state-of-the-art BiLSTM neural network models. The NER model presented in this study achieved 98% precision on the Aimed/BioInfr corpus over previous models. In order to facilitate the production of an accurate representation of the PPI network, the processes were developed to systematically map the protein interactions in the texts. Overall, evaluating our system through the use of 6027 abstracts pertaining to seven proteins associated with Autism Spectrum Disorder completed the manually curated PPI network for these proteins. When it comes to complicated diseases, these networks would assist in understanding how PPI deficits contribute to disease development while also emphasizing the influence of interactions on protein function and biological processes.
format Online
Article
Text
id pubmed-10604135
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-106041352023-10-28 Protein–Protein Interaction Network Extraction Using Text Mining Methods Adds Insight into Autism Spectrum Disorder Nezamuldeen, Leena Jafri, Mohsin Saleet Biology (Basel) Article SIMPLE SUMMARY: Research on proteins and their interactions with other proteins yields many new findings that help explain how diseases emerge. However, manual curation of scientific literature delays new discoveries in the field. Artificial intelligence and deep learning techniques have played a significant part in information extraction from textual forms. In this study, we used text mining and artificial intelligence techniques to address the issue of extracting protein–protein interaction networks from the vast amount of scientific research literature. We have created an automated system consisting of three models using deep learning and natural language processing methods. The accuracy of our first model, which employs recurrent neural networks using sentiment analysis, was 95%. Additionally, the accuracy of our second model, which employs the named entity recognition technique in NLP, was effective and achieved an accuracy of 98%. In comparison to the protein interaction network, we discovered by manual curation of more than 30 articles on Autism Spectrum Disorder, that the automated system testing on 6027 abstracts was successful in developing the network of interactions and provided an improved view. Discovering these networks will greatly help physicians and scientists understand how these molecules interact for physiological, pharmacological, and pathological insight. ABSTRACT: Text mining methods are being developed to assimilate the volume of biomedical textual materials that are continually expanding. Understanding protein–protein interaction (PPI) deficits would assist in explaining the genesis of diseases. In this study, we designed an automated system to extract PPIs from the biomedical literature that uses a deep learning sentence classification model, a pretrained word embedding, and a BiLSTM recurrent neural network with additional layers, a conditional random field (CRF) named entity recognition (NER) model, and shortest-dependency path (SDP) model using the SpaCy library in Python. The automated system ensures that it targets sentences that contain PPIs and not just these proteins mentioned in the framework of disease discovery or other context. Our first model achieved 13% greater precision on the Aimed/BioInfr benchmark corpus than the previous state-of-the-art BiLSTM neural network models. The NER model presented in this study achieved 98% precision on the Aimed/BioInfr corpus over previous models. In order to facilitate the production of an accurate representation of the PPI network, the processes were developed to systematically map the protein interactions in the texts. Overall, evaluating our system through the use of 6027 abstracts pertaining to seven proteins associated with Autism Spectrum Disorder completed the manually curated PPI network for these proteins. When it comes to complicated diseases, these networks would assist in understanding how PPI deficits contribute to disease development while also emphasizing the influence of interactions on protein function and biological processes. MDPI 2023-10-18 /pmc/articles/PMC10604135/ /pubmed/37887054 http://dx.doi.org/10.3390/biology12101344 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Nezamuldeen, Leena
Jafri, Mohsin Saleet
Protein–Protein Interaction Network Extraction Using Text Mining Methods Adds Insight into Autism Spectrum Disorder
title Protein–Protein Interaction Network Extraction Using Text Mining Methods Adds Insight into Autism Spectrum Disorder
title_full Protein–Protein Interaction Network Extraction Using Text Mining Methods Adds Insight into Autism Spectrum Disorder
title_fullStr Protein–Protein Interaction Network Extraction Using Text Mining Methods Adds Insight into Autism Spectrum Disorder
title_full_unstemmed Protein–Protein Interaction Network Extraction Using Text Mining Methods Adds Insight into Autism Spectrum Disorder
title_short Protein–Protein Interaction Network Extraction Using Text Mining Methods Adds Insight into Autism Spectrum Disorder
title_sort protein–protein interaction network extraction using text mining methods adds insight into autism spectrum disorder
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10604135/
https://www.ncbi.nlm.nih.gov/pubmed/37887054
http://dx.doi.org/10.3390/biology12101344
work_keys_str_mv AT nezamuldeenleena proteinproteininteractionnetworkextractionusingtextminingmethodsaddsinsightintoautismspectrumdisorder
AT jafrimohsinsaleet proteinproteininteractionnetworkextractionusingtextminingmethodsaddsinsightintoautismspectrumdisorder