Cargando…

A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity

Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing tools and guidelines are inadequate. In this paper, a novel approach for standardizing categorical test results based on supervised machine lea...

Descripción completa

Detalles Bibliográficos
Autores principales: Ahmmed, Syed, Mondal, M. Rubaiyat Hossain, Mia, Md Raihan, Adibuzzaman, Mohammad, Hoque, Abu Sayed Md. Latiful, Ahamed, Sheikh Iqbal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10685145/
https://www.ncbi.nlm.nih.gov/pubmed/38034661
http://dx.doi.org/10.1016/j.heliyon.2023.e21523
_version_ 1785151564073140224
author Ahmmed, Syed
Mondal, M. Rubaiyat Hossain
Mia, Md Raihan
Adibuzzaman, Mohammad
Hoque, Abu Sayed Md. Latiful
Ahamed, Sheikh Iqbal
author_facet Ahmmed, Syed
Mondal, M. Rubaiyat Hossain
Mia, Md Raihan
Adibuzzaman, Mohammad
Hoque, Abu Sayed Md. Latiful
Ahamed, Sheikh Iqbal
author_sort Ahmmed, Syed
collection PubMed
description Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing tools and guidelines are inadequate. In this paper, a novel approach for standardizing categorical test results based on supervised machine learning and the Jaro-Winkler similarity algorithm is proposed. A supervised machine learning model is used in this approach for scalable categorization of the test results into predefined groups or clusters, while Jaro-Winkler similarity is used to map text terms into standard clinical terms within these corresponding groups. The proposed method is applied to 75062 test results from two private hospitals in Bangladesh. The Support Vector Classification algorithm with a linear kernel has a classification accuracy of 98%, which is better than the Random Forest algorithm when categorizing test results. The experiment results show that Jaro-Winkler similarity achieves a remarkable 99.93% success rate in the test result standardization for the majority of groups with manual validation. The proposed method outperforms previous studies that concentrated on standardizing test results using rule-based classifiers on a smaller number of groups and distance similarities such as Cosine similarity or Levenshtein distance. Furthermore, when applied to the publicly available MIMIC-III dataset, our approach also performs excellently. All these findings show that the proposed standardization technique can be very beneficial for clinical big data research, particularly for national clinical research data hubs in low- and middle-income countries.
format Online
Article
Text
id pubmed-10685145
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-106851452023-11-30 A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity Ahmmed, Syed Mondal, M. Rubaiyat Hossain Mia, Md Raihan Adibuzzaman, Mohammad Hoque, Abu Sayed Md. Latiful Ahamed, Sheikh Iqbal Heliyon Research Article Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing tools and guidelines are inadequate. In this paper, a novel approach for standardizing categorical test results based on supervised machine learning and the Jaro-Winkler similarity algorithm is proposed. A supervised machine learning model is used in this approach for scalable categorization of the test results into predefined groups or clusters, while Jaro-Winkler similarity is used to map text terms into standard clinical terms within these corresponding groups. The proposed method is applied to 75062 test results from two private hospitals in Bangladesh. The Support Vector Classification algorithm with a linear kernel has a classification accuracy of 98%, which is better than the Random Forest algorithm when categorizing test results. The experiment results show that Jaro-Winkler similarity achieves a remarkable 99.93% success rate in the test result standardization for the majority of groups with manual validation. The proposed method outperforms previous studies that concentrated on standardizing test results using rule-based classifiers on a smaller number of groups and distance similarities such as Cosine similarity or Levenshtein distance. Furthermore, when applied to the publicly available MIMIC-III dataset, our approach also performs excellently. All these findings show that the proposed standardization technique can be very beneficial for clinical big data research, particularly for national clinical research data hubs in low- and middle-income countries. Elsevier 2023-11-08 /pmc/articles/PMC10685145/ /pubmed/38034661 http://dx.doi.org/10.1016/j.heliyon.2023.e21523 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Ahmmed, Syed
Mondal, M. Rubaiyat Hossain
Mia, Md Raihan
Adibuzzaman, Mohammad
Hoque, Abu Sayed Md. Latiful
Ahamed, Sheikh Iqbal
A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity
title A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity
title_full A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity
title_fullStr A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity
title_full_unstemmed A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity
title_short A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity
title_sort novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10685145/
https://www.ncbi.nlm.nih.gov/pubmed/38034661
http://dx.doi.org/10.1016/j.heliyon.2023.e21523
work_keys_str_mv AT ahmmedsyed anovelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity
AT mondalmrubaiyathossain anovelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity
AT miamdraihan anovelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity
AT adibuzzamanmohammad anovelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity
AT hoqueabusayedmdlatiful anovelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity
AT ahamedsheikhiqbal anovelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity
AT ahmmedsyed novelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity
AT mondalmrubaiyathossain novelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity
AT miamdraihan novelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity
AT adibuzzamanmohammad novelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity
AT hoqueabusayedmdlatiful novelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity
AT ahamedsheikhiqbal novelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity