Cargando…
Text mining of CHO bioprocess bibliome: Topic modeling and document classification
Chinese hamster ovary (CHO) cells are widely used for mass production of therapeutic proteins in the pharmaceutical industry. With the growing need in optimizing the performance of producer CHO cell lines, research on CHO cell line development and bioprocess continues to increase in recent decades....
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10079098/ https://www.ncbi.nlm.nih.gov/pubmed/37022994 http://dx.doi.org/10.1371/journal.pone.0274042 |
_version_ | 1785020657337106432 |
---|---|
author | Wang, Qinghua Olshin, Jonathan Vijay-Shanker, K. Wu, Cathy H. |
author_facet | Wang, Qinghua Olshin, Jonathan Vijay-Shanker, K. Wu, Cathy H. |
author_sort | Wang, Qinghua |
collection | PubMed |
description | Chinese hamster ovary (CHO) cells are widely used for mass production of therapeutic proteins in the pharmaceutical industry. With the growing need in optimizing the performance of producer CHO cell lines, research on CHO cell line development and bioprocess continues to increase in recent decades. Bibliographic mapping and classification of relevant research studies will be essential for identifying research gaps and trends in literature. To qualitatively and quantitatively understand the CHO literature, we have conducted topic modeling using a CHO bioprocess bibliome manually compiled in 2016, and compared the topics uncovered by the Latent Dirichlet Allocation (LDA) models with the human labels of the CHO bibliome. The results show a significant overlap between the manually selected categories and computationally generated topics, and reveal the machine-generated topic-specific characteristics. To identify relevant CHO bioprocessing papers from new scientific literature, we have developed supervized models using Logistic Regression to identify specific article topics and evaluated the results using three CHO bibliome datasets, Bioprocessing set, Glycosylation set, and Phenotype set. The use of top terms as features supports the explainability of document classification results to yield insights on new CHO bioprocessing papers. |
format | Online Article Text |
id | pubmed-10079098 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-100790982023-04-07 Text mining of CHO bioprocess bibliome: Topic modeling and document classification Wang, Qinghua Olshin, Jonathan Vijay-Shanker, K. Wu, Cathy H. PLoS One Research Article Chinese hamster ovary (CHO) cells are widely used for mass production of therapeutic proteins in the pharmaceutical industry. With the growing need in optimizing the performance of producer CHO cell lines, research on CHO cell line development and bioprocess continues to increase in recent decades. Bibliographic mapping and classification of relevant research studies will be essential for identifying research gaps and trends in literature. To qualitatively and quantitatively understand the CHO literature, we have conducted topic modeling using a CHO bioprocess bibliome manually compiled in 2016, and compared the topics uncovered by the Latent Dirichlet Allocation (LDA) models with the human labels of the CHO bibliome. The results show a significant overlap between the manually selected categories and computationally generated topics, and reveal the machine-generated topic-specific characteristics. To identify relevant CHO bioprocessing papers from new scientific literature, we have developed supervized models using Logistic Regression to identify specific article topics and evaluated the results using three CHO bibliome datasets, Bioprocessing set, Glycosylation set, and Phenotype set. The use of top terms as features supports the explainability of document classification results to yield insights on new CHO bioprocessing papers. Public Library of Science 2023-04-06 /pmc/articles/PMC10079098/ /pubmed/37022994 http://dx.doi.org/10.1371/journal.pone.0274042 Text en © 2023 Wang et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Wang, Qinghua Olshin, Jonathan Vijay-Shanker, K. Wu, Cathy H. Text mining of CHO bioprocess bibliome: Topic modeling and document classification |
title | Text mining of CHO bioprocess bibliome: Topic modeling and document classification |
title_full | Text mining of CHO bioprocess bibliome: Topic modeling and document classification |
title_fullStr | Text mining of CHO bioprocess bibliome: Topic modeling and document classification |
title_full_unstemmed | Text mining of CHO bioprocess bibliome: Topic modeling and document classification |
title_short | Text mining of CHO bioprocess bibliome: Topic modeling and document classification |
title_sort | text mining of cho bioprocess bibliome: topic modeling and document classification |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10079098/ https://www.ncbi.nlm.nih.gov/pubmed/37022994 http://dx.doi.org/10.1371/journal.pone.0274042 |
work_keys_str_mv | AT wangqinghua textminingofchobioprocessbibliometopicmodelinganddocumentclassification AT olshinjonathan textminingofchobioprocessbibliometopicmodelinganddocumentclassification AT vijayshankerk textminingofchobioprocessbibliometopicmodelinganddocumentclassification AT wucathyh textminingofchobioprocessbibliometopicmodelinganddocumentclassification |