Cargando…

Analysis of Persian Bioinformatics Research with Topic Modeling

PURPOSE: As a scientific field, bioinformatics has drawn remarkable attention from various fields, such as information technology, mathematics, and modern biological sciences, in recent years. The topic models originating from the field of natural language processing have become the focus of attenti...

Descripción completa

Detalles Bibliográficos
Autores principales: Ebrahimi, Fezzeh, Dehghani, Mohammad, Makkizadeh, Fatemah
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10125747/
https://www.ncbi.nlm.nih.gov/pubmed/37101687
http://dx.doi.org/10.1155/2023/3728131
_version_ 1785030090159030272
author Ebrahimi, Fezzeh
Dehghani, Mohammad
Makkizadeh, Fatemah
author_facet Ebrahimi, Fezzeh
Dehghani, Mohammad
Makkizadeh, Fatemah
author_sort Ebrahimi, Fezzeh
collection PubMed
description PURPOSE: As a scientific field, bioinformatics has drawn remarkable attention from various fields, such as information technology, mathematics, and modern biological sciences, in recent years. The topic models originating from the field of natural language processing have become the focus of attention with the rapid accumulation of biological datasets. Thus, this research is aimed at modeling the topic content of the bioinformatics literature presented by Iranian researchers in the Scopus Citation Database. Methodology. This research was a descriptive-exploratory study, and the studied population included 3899 papers indexed in the Scopus database, which had been indexed in this database until March 9, 2022. The topic modeling was then performed on the abstracts and titles of the papers. A combination of LDA and TF-IDF was utilized for topic modeling. Findings. The data analysis with topic modeling resulted in identifying seven main topics “Molecular Modeling,” “Gene Expression,” “Biomarker,” “Coronavirus,” “Immunoinformatics,” “Cancer Bioinformatics,” and “Systems Biology.” Moreover, “Systems Biology” and “Coronavirus” had the largest and smallest clusters, respectively. CONCLUSION: The present investigation demonstrated an acceptable performance for the LDA algorithm in classifying the topics included in this field. The extracted topic clusters indicated excellent consistency and topic connection with each other.
format Online
Article
Text
id pubmed-10125747
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-101257472023-04-25 Analysis of Persian Bioinformatics Research with Topic Modeling Ebrahimi, Fezzeh Dehghani, Mohammad Makkizadeh, Fatemah Biomed Res Int Research Article PURPOSE: As a scientific field, bioinformatics has drawn remarkable attention from various fields, such as information technology, mathematics, and modern biological sciences, in recent years. The topic models originating from the field of natural language processing have become the focus of attention with the rapid accumulation of biological datasets. Thus, this research is aimed at modeling the topic content of the bioinformatics literature presented by Iranian researchers in the Scopus Citation Database. Methodology. This research was a descriptive-exploratory study, and the studied population included 3899 papers indexed in the Scopus database, which had been indexed in this database until March 9, 2022. The topic modeling was then performed on the abstracts and titles of the papers. A combination of LDA and TF-IDF was utilized for topic modeling. Findings. The data analysis with topic modeling resulted in identifying seven main topics “Molecular Modeling,” “Gene Expression,” “Biomarker,” “Coronavirus,” “Immunoinformatics,” “Cancer Bioinformatics,” and “Systems Biology.” Moreover, “Systems Biology” and “Coronavirus” had the largest and smallest clusters, respectively. CONCLUSION: The present investigation demonstrated an acceptable performance for the LDA algorithm in classifying the topics included in this field. The extracted topic clusters indicated excellent consistency and topic connection with each other. Hindawi 2023-04-17 /pmc/articles/PMC10125747/ /pubmed/37101687 http://dx.doi.org/10.1155/2023/3728131 Text en Copyright © 2023 Fezzeh Ebrahimi et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Ebrahimi, Fezzeh
Dehghani, Mohammad
Makkizadeh, Fatemah
Analysis of Persian Bioinformatics Research with Topic Modeling
title Analysis of Persian Bioinformatics Research with Topic Modeling
title_full Analysis of Persian Bioinformatics Research with Topic Modeling
title_fullStr Analysis of Persian Bioinformatics Research with Topic Modeling
title_full_unstemmed Analysis of Persian Bioinformatics Research with Topic Modeling
title_short Analysis of Persian Bioinformatics Research with Topic Modeling
title_sort analysis of persian bioinformatics research with topic modeling
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10125747/
https://www.ncbi.nlm.nih.gov/pubmed/37101687
http://dx.doi.org/10.1155/2023/3728131
work_keys_str_mv AT ebrahimifezzeh analysisofpersianbioinformaticsresearchwithtopicmodeling
AT dehghanimohammad analysisofpersianbioinformaticsresearchwithtopicmodeling
AT makkizadehfatemah analysisofpersianbioinformaticsresearchwithtopicmodeling