Cargando…
Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula
Purpose Radiology reports mostly contain free-text, which makes it challenging to obtain structured data. Natural language processing (NLP) techniques transform free-text reports into machine-readable document vectors that are important for creating reliable, scalable methods for data analysis. The...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Georg Thieme Verlag KG
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10368466/ https://www.ncbi.nlm.nih.gov/pubmed/37160146 http://dx.doi.org/10.1055/a-2061-6562 |
_version_ | 1785077511098466304 |
---|---|
author | Dewald, Cornelia L.A. Balandis, Alina Becker, Lena S. Hinrichs, Jan B. von Falck, Christian Wacker, Frank K. Laser, Hans Gerbel, Svetlana Winther, Hinrich B. Apfel-Starke, Johanna |
author_facet | Dewald, Cornelia L.A. Balandis, Alina Becker, Lena S. Hinrichs, Jan B. von Falck, Christian Wacker, Frank K. Laser, Hans Gerbel, Svetlana Winther, Hinrich B. Apfel-Starke, Johanna |
author_sort | Dewald, Cornelia L.A. |
collection | PubMed |
description | Purpose Radiology reports mostly contain free-text, which makes it challenging to obtain structured data. Natural language processing (NLP) techniques transform free-text reports into machine-readable document vectors that are important for creating reliable, scalable methods for data analysis. The aim of this study is to classify unstructured radiograph reports according to fractures of the distal fibula and to find the best text mining method. Materials & Methods We established a novel German language report dataset: a designated search engine was used to identify radiographs of the ankle and the reports were manually labeled according to fractures of the distal fibula. This data was used to establish a machine learning pipeline, which implemented the text representation methods bag-of-words (BOW), term frequency-inverse document frequency (TF-IDF), principal component analysis (PCA), non-negative matrix factorization (NMF), latent Dirichlet allocation (LDA), and document embedding (doc2vec). The extracted document vectors were used to train neural networks (NN), support vector machines (SVM), and logistic regression (LR) to recognize distal fibula fractures. The results were compared via cross-tabulations of the accuracy (acc) and area under the curve (AUC). Results In total, 3268 radiograph reports were included, of which 1076 described a fracture of the distal fibula. Comparison of the text representation methods showed that BOW achieved the best results (AUC = 0.98; acc = 0.97), followed by TF-IDF (AUC = 0.97; acc = 0.96), NMF (AUC = 0.93; acc = 0.92), PCA (AUC = 0.92; acc = 0.9), LDA (AUC = 0.91; acc = 0.89) and doc2vec (AUC = 0.9; acc = 0.88). When comparing the different classifiers, NN (AUC = 0,91) proved to be superior to SVM (AUC = 0,87) and LR (AUC = 0,85). Conclusion An automated classification of unstructured reports of radiographs of the ankle can reliably detect findings of fractures of the distal fibula. A particularly suitable feature extraction method is the BOW model. Key Points:: The aim was to classify unstructured radiograph reports according to distal fibula fractures. Our automated classification system can reliably detect fractures of the distal fibula. A particularly suitable feature extraction method is the BOW model. Citation Format: Dewald CL, Balandis A, Becker LS et al. Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula. Fortschr Röntgenstr 2023; 195: 713 – 719. |
format | Online Article Text |
id | pubmed-10368466 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Georg Thieme Verlag KG |
record_format | MEDLINE/PubMed |
spelling | pubmed-103684662023-07-26 Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula Dewald, Cornelia L.A. Balandis, Alina Becker, Lena S. Hinrichs, Jan B. von Falck, Christian Wacker, Frank K. Laser, Hans Gerbel, Svetlana Winther, Hinrich B. Apfel-Starke, Johanna Rofo Purpose Radiology reports mostly contain free-text, which makes it challenging to obtain structured data. Natural language processing (NLP) techniques transform free-text reports into machine-readable document vectors that are important for creating reliable, scalable methods for data analysis. The aim of this study is to classify unstructured radiograph reports according to fractures of the distal fibula and to find the best text mining method. Materials & Methods We established a novel German language report dataset: a designated search engine was used to identify radiographs of the ankle and the reports were manually labeled according to fractures of the distal fibula. This data was used to establish a machine learning pipeline, which implemented the text representation methods bag-of-words (BOW), term frequency-inverse document frequency (TF-IDF), principal component analysis (PCA), non-negative matrix factorization (NMF), latent Dirichlet allocation (LDA), and document embedding (doc2vec). The extracted document vectors were used to train neural networks (NN), support vector machines (SVM), and logistic regression (LR) to recognize distal fibula fractures. The results were compared via cross-tabulations of the accuracy (acc) and area under the curve (AUC). Results In total, 3268 radiograph reports were included, of which 1076 described a fracture of the distal fibula. Comparison of the text representation methods showed that BOW achieved the best results (AUC = 0.98; acc = 0.97), followed by TF-IDF (AUC = 0.97; acc = 0.96), NMF (AUC = 0.93; acc = 0.92), PCA (AUC = 0.92; acc = 0.9), LDA (AUC = 0.91; acc = 0.89) and doc2vec (AUC = 0.9; acc = 0.88). When comparing the different classifiers, NN (AUC = 0,91) proved to be superior to SVM (AUC = 0,87) and LR (AUC = 0,85). Conclusion An automated classification of unstructured reports of radiographs of the ankle can reliably detect findings of fractures of the distal fibula. A particularly suitable feature extraction method is the BOW model. Key Points:: The aim was to classify unstructured radiograph reports according to distal fibula fractures. Our automated classification system can reliably detect fractures of the distal fibula. A particularly suitable feature extraction method is the BOW model. Citation Format: Dewald CL, Balandis A, Becker LS et al. Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula. Fortschr Röntgenstr 2023; 195: 713 – 719. Georg Thieme Verlag KG 2023-05-09 /pmc/articles/PMC10368466/ /pubmed/37160146 http://dx.doi.org/10.1055/a-2061-6562 Text en The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial-License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/). https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License, which permits unrestricted reproduction and distribution, for non-commercial purposes only; and use and reproduction, but not distribution, of adapted material for non-commercial purposes only, provided the original work is properly cited. |
spellingShingle | Dewald, Cornelia L.A. Balandis, Alina Becker, Lena S. Hinrichs, Jan B. von Falck, Christian Wacker, Frank K. Laser, Hans Gerbel, Svetlana Winther, Hinrich B. Apfel-Starke, Johanna Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula |
title | Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula |
title_full | Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula |
title_fullStr | Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula |
title_full_unstemmed | Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula |
title_short | Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula |
title_sort | automated classification of free-text radiology reports: using different feature extraction methods to identify fractures of the distal fibula |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10368466/ https://www.ncbi.nlm.nih.gov/pubmed/37160146 http://dx.doi.org/10.1055/a-2061-6562 |
work_keys_str_mv | AT dewaldcorneliala automatedclassificationoffreetextradiologyreportsusingdifferentfeatureextractionmethodstoidentifyfracturesofthedistalfibula AT balandisalina automatedclassificationoffreetextradiologyreportsusingdifferentfeatureextractionmethodstoidentifyfracturesofthedistalfibula AT beckerlenas automatedclassificationoffreetextradiologyreportsusingdifferentfeatureextractionmethodstoidentifyfracturesofthedistalfibula AT hinrichsjanb automatedclassificationoffreetextradiologyreportsusingdifferentfeatureextractionmethodstoidentifyfracturesofthedistalfibula AT vonfalckchristian automatedclassificationoffreetextradiologyreportsusingdifferentfeatureextractionmethodstoidentifyfracturesofthedistalfibula AT wackerfrankk automatedclassificationoffreetextradiologyreportsusingdifferentfeatureextractionmethodstoidentifyfracturesofthedistalfibula AT laserhans automatedclassificationoffreetextradiologyreportsusingdifferentfeatureextractionmethodstoidentifyfracturesofthedistalfibula AT gerbelsvetlana automatedclassificationoffreetextradiologyreportsusingdifferentfeatureextractionmethodstoidentifyfracturesofthedistalfibula AT wintherhinrichb automatedclassificationoffreetextradiologyreportsusingdifferentfeatureextractionmethodstoidentifyfracturesofthedistalfibula AT apfelstarkejohanna automatedclassificationoffreetextradiologyreportsusingdifferentfeatureextractionmethodstoidentifyfracturesofthedistalfibula |