Cargando…

Classification of cervical biopsy free-text diagnoses through linear-classifier based natural language processing

Routine cervical cancer screening has significantly decreased the incidence and mortality of cervical cancer. As selection of proper screening modalities depends on well-validated clinical decision algorithms, retrospective review correlating cytology and HPV test results with cervical biopsy diagno...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hsu, Jim Wei-Chun, Christensen, Paul, Ge, Yimin, Long, S. Wesley
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2022
Materias:	Original Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9577054/ https://www.ncbi.nlm.nih.gov/pubmed/36268101 http://dx.doi.org/10.1016/j.jpi.2022.100123

_version_	1784811670818783232
author	Hsu, Jim Wei-Chun Christensen, Paul Ge, Yimin Long, S. Wesley
author_facet	Hsu, Jim Wei-Chun Christensen, Paul Ge, Yimin Long, S. Wesley
author_sort	Hsu, Jim Wei-Chun
collection	PubMed
description	Routine cervical cancer screening has significantly decreased the incidence and mortality of cervical cancer. As selection of proper screening modalities depends on well-validated clinical decision algorithms, retrospective review correlating cytology and HPV test results with cervical biopsy diagnosis is essential for validating and revising these algorithms to changing technologies, demographics, and optimal clinical practices. However, manual categorization of the free-text biopsy diagnosis into discrete categories is extremely laborious due to the overwhelming number of specimens, which may lead to significant error and bias. Advances in machine learning and natural language processing (NLP), particularly over the last decade, have led to significant accomplishments and impressive performance in computer-based classification tasks. In this work, we apply an efficient version of an NLP framework, FastText™, to an annotated cervical biopsy dataset to create a supervised classifier that can assign accurate biopsy categories to free-text biopsy interpretations with high concordance to manually annotated data (>99.6%). We present cases where the machine-learning classifier disagrees with previous annotations and examine these discrepant cases after referee review by an expert pathologist. We also show that the classifier is robust on an untrained external dataset, achieving a concordance of 97.7%. In conclusion, we demonstrate a useful application of NLP to a real-world pathology classification task and highlight the benefits and limitations of this approach.
format	Online Article Text
id	pubmed-9577054
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-95770542022-10-19 Classification of cervical biopsy free-text diagnoses through linear-classifier based natural language processing Hsu, Jim Wei-Chun Christensen, Paul Ge, Yimin Long, S. Wesley J Pathol Inform Original Research Article Routine cervical cancer screening has significantly decreased the incidence and mortality of cervical cancer. As selection of proper screening modalities depends on well-validated clinical decision algorithms, retrospective review correlating cytology and HPV test results with cervical biopsy diagnosis is essential for validating and revising these algorithms to changing technologies, demographics, and optimal clinical practices. However, manual categorization of the free-text biopsy diagnosis into discrete categories is extremely laborious due to the overwhelming number of specimens, which may lead to significant error and bias. Advances in machine learning and natural language processing (NLP), particularly over the last decade, have led to significant accomplishments and impressive performance in computer-based classification tasks. In this work, we apply an efficient version of an NLP framework, FastText™, to an annotated cervical biopsy dataset to create a supervised classifier that can assign accurate biopsy categories to free-text biopsy interpretations with high concordance to manually annotated data (>99.6%). We present cases where the machine-learning classifier disagrees with previous annotations and examine these discrepant cases after referee review by an expert pathologist. We also show that the classifier is robust on an untrained external dataset, achieving a concordance of 97.7%. In conclusion, we demonstrate a useful application of NLP to a real-world pathology classification task and highlight the benefits and limitations of this approach. Elsevier 2022-07-01 /pmc/articles/PMC9577054/ /pubmed/36268101 http://dx.doi.org/10.1016/j.jpi.2022.100123 Text en © 2022 The Authors. Published by Elsevier Inc. on behalf of Association for Pathology Informatics. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Original Research Article Hsu, Jim Wei-Chun Christensen, Paul Ge, Yimin Long, S. Wesley Classification of cervical biopsy free-text diagnoses through linear-classifier based natural language processing
title	Classification of cervical biopsy free-text diagnoses through linear-classifier based natural language processing
title_full	Classification of cervical biopsy free-text diagnoses through linear-classifier based natural language processing
title_fullStr	Classification of cervical biopsy free-text diagnoses through linear-classifier based natural language processing
title_full_unstemmed	Classification of cervical biopsy free-text diagnoses through linear-classifier based natural language processing
title_short	Classification of cervical biopsy free-text diagnoses through linear-classifier based natural language processing
title_sort	classification of cervical biopsy free-text diagnoses through linear-classifier based natural language processing
topic	Original Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9577054/ https://www.ncbi.nlm.nih.gov/pubmed/36268101 http://dx.doi.org/10.1016/j.jpi.2022.100123
work_keys_str_mv	AT hsujimweichun classificationofcervicalbiopsyfreetextdiagnosesthroughlinearclassifierbasednaturallanguageprocessing AT christensenpaul classificationofcervicalbiopsyfreetextdiagnosesthroughlinearclassifierbasednaturallanguageprocessing AT geyimin classificationofcervicalbiopsyfreetextdiagnosesthroughlinearclassifierbasednaturallanguageprocessing AT longswesley classificationofcervicalbiopsyfreetextdiagnosesthroughlinearclassifierbasednaturallanguageprocessing

Classification of cervical biopsy free-text diagnoses through linear-classifier based natural language processing

Ejemplares similares