Cargando…

Text mining of CHO bioprocess bibliome: Topic modeling and document classification

Chinese hamster ovary (CHO) cells are widely used for mass production of therapeutic proteins in the pharmaceutical industry. With the growing need in optimizing the performance of producer CHO cell lines, research on CHO cell line development and bioprocess continues to increase in recent decades....

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Qinghua, Olshin, Jonathan, Vijay-Shanker, K., Wu, Cathy H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10079098/
https://www.ncbi.nlm.nih.gov/pubmed/37022994
http://dx.doi.org/10.1371/journal.pone.0274042
_version_ 1785020657337106432
author Wang, Qinghua
Olshin, Jonathan
Vijay-Shanker, K.
Wu, Cathy H.
author_facet Wang, Qinghua
Olshin, Jonathan
Vijay-Shanker, K.
Wu, Cathy H.
author_sort Wang, Qinghua
collection PubMed
description Chinese hamster ovary (CHO) cells are widely used for mass production of therapeutic proteins in the pharmaceutical industry. With the growing need in optimizing the performance of producer CHO cell lines, research on CHO cell line development and bioprocess continues to increase in recent decades. Bibliographic mapping and classification of relevant research studies will be essential for identifying research gaps and trends in literature. To qualitatively and quantitatively understand the CHO literature, we have conducted topic modeling using a CHO bioprocess bibliome manually compiled in 2016, and compared the topics uncovered by the Latent Dirichlet Allocation (LDA) models with the human labels of the CHO bibliome. The results show a significant overlap between the manually selected categories and computationally generated topics, and reveal the machine-generated topic-specific characteristics. To identify relevant CHO bioprocessing papers from new scientific literature, we have developed supervized models using Logistic Regression to identify specific article topics and evaluated the results using three CHO bibliome datasets, Bioprocessing set, Glycosylation set, and Phenotype set. The use of top terms as features supports the explainability of document classification results to yield insights on new CHO bioprocessing papers.
format Online
Article
Text
id pubmed-10079098
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-100790982023-04-07 Text mining of CHO bioprocess bibliome: Topic modeling and document classification Wang, Qinghua Olshin, Jonathan Vijay-Shanker, K. Wu, Cathy H. PLoS One Research Article Chinese hamster ovary (CHO) cells are widely used for mass production of therapeutic proteins in the pharmaceutical industry. With the growing need in optimizing the performance of producer CHO cell lines, research on CHO cell line development and bioprocess continues to increase in recent decades. Bibliographic mapping and classification of relevant research studies will be essential for identifying research gaps and trends in literature. To qualitatively and quantitatively understand the CHO literature, we have conducted topic modeling using a CHO bioprocess bibliome manually compiled in 2016, and compared the topics uncovered by the Latent Dirichlet Allocation (LDA) models with the human labels of the CHO bibliome. The results show a significant overlap between the manually selected categories and computationally generated topics, and reveal the machine-generated topic-specific characteristics. To identify relevant CHO bioprocessing papers from new scientific literature, we have developed supervized models using Logistic Regression to identify specific article topics and evaluated the results using three CHO bibliome datasets, Bioprocessing set, Glycosylation set, and Phenotype set. The use of top terms as features supports the explainability of document classification results to yield insights on new CHO bioprocessing papers. Public Library of Science 2023-04-06 /pmc/articles/PMC10079098/ /pubmed/37022994 http://dx.doi.org/10.1371/journal.pone.0274042 Text en © 2023 Wang et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Wang, Qinghua
Olshin, Jonathan
Vijay-Shanker, K.
Wu, Cathy H.
Text mining of CHO bioprocess bibliome: Topic modeling and document classification
title Text mining of CHO bioprocess bibliome: Topic modeling and document classification
title_full Text mining of CHO bioprocess bibliome: Topic modeling and document classification
title_fullStr Text mining of CHO bioprocess bibliome: Topic modeling and document classification
title_full_unstemmed Text mining of CHO bioprocess bibliome: Topic modeling and document classification
title_short Text mining of CHO bioprocess bibliome: Topic modeling and document classification
title_sort text mining of cho bioprocess bibliome: topic modeling and document classification
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10079098/
https://www.ncbi.nlm.nih.gov/pubmed/37022994
http://dx.doi.org/10.1371/journal.pone.0274042
work_keys_str_mv AT wangqinghua textminingofchobioprocessbibliometopicmodelinganddocumentclassification
AT olshinjonathan textminingofchobioprocessbibliometopicmodelinganddocumentclassification
AT vijayshankerk textminingofchobioprocessbibliometopicmodelinganddocumentclassification
AT wucathyh textminingofchobioprocessbibliometopicmodelinganddocumentclassification