Cargando…
A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity
Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing tools and guidelines are inadequate. In this paper, a novel approach for standardizing categorical test results based on supervised machine lea...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10685145/ https://www.ncbi.nlm.nih.gov/pubmed/38034661 http://dx.doi.org/10.1016/j.heliyon.2023.e21523 |
_version_ | 1785151564073140224 |
---|---|
author | Ahmmed, Syed Mondal, M. Rubaiyat Hossain Mia, Md Raihan Adibuzzaman, Mohammad Hoque, Abu Sayed Md. Latiful Ahamed, Sheikh Iqbal |
author_facet | Ahmmed, Syed Mondal, M. Rubaiyat Hossain Mia, Md Raihan Adibuzzaman, Mohammad Hoque, Abu Sayed Md. Latiful Ahamed, Sheikh Iqbal |
author_sort | Ahmmed, Syed |
collection | PubMed |
description | Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing tools and guidelines are inadequate. In this paper, a novel approach for standardizing categorical test results based on supervised machine learning and the Jaro-Winkler similarity algorithm is proposed. A supervised machine learning model is used in this approach for scalable categorization of the test results into predefined groups or clusters, while Jaro-Winkler similarity is used to map text terms into standard clinical terms within these corresponding groups. The proposed method is applied to 75062 test results from two private hospitals in Bangladesh. The Support Vector Classification algorithm with a linear kernel has a classification accuracy of 98%, which is better than the Random Forest algorithm when categorizing test results. The experiment results show that Jaro-Winkler similarity achieves a remarkable 99.93% success rate in the test result standardization for the majority of groups with manual validation. The proposed method outperforms previous studies that concentrated on standardizing test results using rule-based classifiers on a smaller number of groups and distance similarities such as Cosine similarity or Levenshtein distance. Furthermore, when applied to the publicly available MIMIC-III dataset, our approach also performs excellently. All these findings show that the proposed standardization technique can be very beneficial for clinical big data research, particularly for national clinical research data hubs in low- and middle-income countries. |
format | Online Article Text |
id | pubmed-10685145 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-106851452023-11-30 A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity Ahmmed, Syed Mondal, M. Rubaiyat Hossain Mia, Md Raihan Adibuzzaman, Mohammad Hoque, Abu Sayed Md. Latiful Ahamed, Sheikh Iqbal Heliyon Research Article Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing tools and guidelines are inadequate. In this paper, a novel approach for standardizing categorical test results based on supervised machine learning and the Jaro-Winkler similarity algorithm is proposed. A supervised machine learning model is used in this approach for scalable categorization of the test results into predefined groups or clusters, while Jaro-Winkler similarity is used to map text terms into standard clinical terms within these corresponding groups. The proposed method is applied to 75062 test results from two private hospitals in Bangladesh. The Support Vector Classification algorithm with a linear kernel has a classification accuracy of 98%, which is better than the Random Forest algorithm when categorizing test results. The experiment results show that Jaro-Winkler similarity achieves a remarkable 99.93% success rate in the test result standardization for the majority of groups with manual validation. The proposed method outperforms previous studies that concentrated on standardizing test results using rule-based classifiers on a smaller number of groups and distance similarities such as Cosine similarity or Levenshtein distance. Furthermore, when applied to the publicly available MIMIC-III dataset, our approach also performs excellently. All these findings show that the proposed standardization technique can be very beneficial for clinical big data research, particularly for national clinical research data hubs in low- and middle-income countries. Elsevier 2023-11-08 /pmc/articles/PMC10685145/ /pubmed/38034661 http://dx.doi.org/10.1016/j.heliyon.2023.e21523 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Research Article Ahmmed, Syed Mondal, M. Rubaiyat Hossain Mia, Md Raihan Adibuzzaman, Mohammad Hoque, Abu Sayed Md. Latiful Ahamed, Sheikh Iqbal A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity |
title | A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity |
title_full | A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity |
title_fullStr | A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity |
title_full_unstemmed | A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity |
title_short | A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity |
title_sort | novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10685145/ https://www.ncbi.nlm.nih.gov/pubmed/38034661 http://dx.doi.org/10.1016/j.heliyon.2023.e21523 |
work_keys_str_mv | AT ahmmedsyed anovelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity AT mondalmrubaiyathossain anovelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity AT miamdraihan anovelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity AT adibuzzamanmohammad anovelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity AT hoqueabusayedmdlatiful anovelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity AT ahamedsheikhiqbal anovelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity AT ahmmedsyed novelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity AT mondalmrubaiyathossain novelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity AT miamdraihan novelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity AT adibuzzamanmohammad novelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity AT hoqueabusayedmdlatiful novelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity AT ahamedsheikhiqbal novelapproachforstandardizingclinicallaboratorycategoricaltestresultsusingmachinelearningandstringdistancesimilarity |