Cargando…

Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study

BACKGROUND: Data standardization is essential in electronic health records (EHRs) for both clinical practice and retrospective research. However, it is still not easy to standardize EHR data because of nonidentical duplicates, typographical errors, or inconsistencies. To overcome this drawback, stan...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, Mina, Shin, Soo-Yong, Kang, Mira, Yi, Byoung-Kee, Chang, Dong Kyung
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2019
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6740165/ https://www.ncbi.nlm.nih.gov/pubmed/31469075 http://dx.doi.org/10.2196/14083

_version_	1783451048128544768
author	Kim, Mina Shin, Soo-Yong Kang, Mira Yi, Byoung-Kee Chang, Dong Kyung
author_facet	Kim, Mina Shin, Soo-Yong Kang, Mira Yi, Byoung-Kee Chang, Dong Kyung
author_sort	Kim, Mina
collection	PubMed
description	BACKGROUND: Data standardization is essential in electronic health records (EHRs) for both clinical practice and retrospective research. However, it is still not easy to standardize EHR data because of nonidentical duplicates, typographical errors, or inconsistencies. To overcome this drawback, standardization efforts have been undertaken for collecting data in a standardized format as well as for curating the stored data in EHRs. To perform clinical big data research, the stored data in EHR should be standardized, starting from laboratory results, given their importance. However, most of the previous efforts have been based on labor-intensive manual methods. OBJECTIVE: We aimed to develop an automatic standardization method for eliminating the noises of categorical laboratory data, grouping, and mapping of cleaned data using standard terminology. METHODS: We developed a method called standardization algorithm for laboratory test–categorical result (SALT-C) that can process categorical laboratory data, such as pos +, 250 4+ (urinalysis results), and reddish (urinalysis color results). SALT-C consists of five steps. First, it applies data cleaning rules to categorical laboratory data. Second, it categorizes the cleaned data into 5 predefined groups (urine color, urine dipstick, blood type, presence-finding, and pathogenesis tests). Third, all data in each group are vectorized. Fourth, similarity is calculated between the vectors of data and those of each value in the predefined value sets. Finally, the value closest to the data is assigned. RESULTS: The performance of SALT-C was validated using 59,213,696 data points (167,938 unique values) generated over 23 years from a tertiary hospital. Apart from the data whose original meaning could not be interpreted correctly (eg, ** and _^), SALT-C mapped unique raw data to the correct reference value for each group with accuracy of 97.6% (123/126; urine color tests), 97.5% (198/203; (urine dipstick tests), 95% (53/56; blood type tests), 99.68% (162,291/162,805; presence-finding tests), and 99.61% (4643/4661; pathogenesis tests). CONCLUSIONS: The proposed SALT-C successfully standardized the categorical laboratory test results with high reliability. SALT-C can be beneficial for clinical big data research by reducing laborious manual standardization efforts.
format	Online Article Text
id	pubmed-6740165
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-67401652019-09-23 Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study Kim, Mina Shin, Soo-Yong Kang, Mira Yi, Byoung-Kee Chang, Dong Kyung JMIR Med Inform Original Paper BACKGROUND: Data standardization is essential in electronic health records (EHRs) for both clinical practice and retrospective research. However, it is still not easy to standardize EHR data because of nonidentical duplicates, typographical errors, or inconsistencies. To overcome this drawback, standardization efforts have been undertaken for collecting data in a standardized format as well as for curating the stored data in EHRs. To perform clinical big data research, the stored data in EHR should be standardized, starting from laboratory results, given their importance. However, most of the previous efforts have been based on labor-intensive manual methods. OBJECTIVE: We aimed to develop an automatic standardization method for eliminating the noises of categorical laboratory data, grouping, and mapping of cleaned data using standard terminology. METHODS: We developed a method called standardization algorithm for laboratory test–categorical result (SALT-C) that can process categorical laboratory data, such as pos +, 250 4+ (urinalysis results), and reddish (urinalysis color results). SALT-C consists of five steps. First, it applies data cleaning rules to categorical laboratory data. Second, it categorizes the cleaned data into 5 predefined groups (urine color, urine dipstick, blood type, presence-finding, and pathogenesis tests). Third, all data in each group are vectorized. Fourth, similarity is calculated between the vectors of data and those of each value in the predefined value sets. Finally, the value closest to the data is assigned. RESULTS: The performance of SALT-C was validated using 59,213,696 data points (167,938 unique values) generated over 23 years from a tertiary hospital. Apart from the data whose original meaning could not be interpreted correctly (eg, ** and _^), SALT-C mapped unique raw data to the correct reference value for each group with accuracy of 97.6% (123/126; urine color tests), 97.5% (198/203; (urine dipstick tests), 95% (53/56; blood type tests), 99.68% (162,291/162,805; presence-finding tests), and 99.61% (4643/4661; pathogenesis tests). CONCLUSIONS: The proposed SALT-C successfully standardized the categorical laboratory test results with high reliability. SALT-C can be beneficial for clinical big data research by reducing laborious manual standardization efforts. JMIR Publications 2019-08-29 /pmc/articles/PMC6740165/ /pubmed/31469075 http://dx.doi.org/10.2196/14083 Text en ©Mina Kim, Soo-Yong Shin, Mira Kang, Byoung-Kee Yi, Dong Kyung Chang. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 29.08.2019. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Kim, Mina Shin, Soo-Yong Kang, Mira Yi, Byoung-Kee Chang, Dong Kyung Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study
title	Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study
title_full	Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study
title_fullStr	Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study
title_full_unstemmed	Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study
title_short	Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study
title_sort	developing a standardization algorithm for categorical laboratory tests for clinical big data research: retrospective study
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6740165/ https://www.ncbi.nlm.nih.gov/pubmed/31469075 http://dx.doi.org/10.2196/14083
work_keys_str_mv	AT kimmina developingastandardizationalgorithmforcategoricallaboratorytestsforclinicalbigdataresearchretrospectivestudy AT shinsooyong developingastandardizationalgorithmforcategoricallaboratorytestsforclinicalbigdataresearchretrospectivestudy AT kangmira developingastandardizationalgorithmforcategoricallaboratorytestsforclinicalbigdataresearchretrospectivestudy AT yibyoungkee developingastandardizationalgorithmforcategoricallaboratorytestsforclinicalbigdataresearchretrospectivestudy AT changdongkyung developingastandardizationalgorithmforcategoricallaboratorytestsforclinicalbigdataresearchretrospectivestudy

Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study

Ejemplares similares