Cargando…

An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance

OBJECTIVES: We aimed to gain a better understanding of how standardization of laboratory data can impact predictive model performance in multi-site datasets. We hypothesized that standardizing local laboratory codes to logical observation identifiers names and codes (LOINC) would produce predictive...

Descripción completa

Detalles Bibliográficos
Autores principales: Barda, Amie J, Ruiz, Victor M, Gigliotti, Tony, Tsui, Fuchiang (Rich)
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6435008/
https://www.ncbi.nlm.nih.gov/pubmed/30944914
http://dx.doi.org/10.1093/jamiaopen/ooy063
_version_ 1783406582464249856
author Barda, Amie J
Ruiz, Victor M
Gigliotti, Tony
Tsui, Fuchiang (Rich)
author_facet Barda, Amie J
Ruiz, Victor M
Gigliotti, Tony
Tsui, Fuchiang (Rich)
author_sort Barda, Amie J
collection PubMed
description OBJECTIVES: We aimed to gain a better understanding of how standardization of laboratory data can impact predictive model performance in multi-site datasets. We hypothesized that standardizing local laboratory codes to logical observation identifiers names and codes (LOINC) would produce predictive models that significantly outperform those learned utilizing local laboratory codes. MATERIALS AND METHODS: We predicted 30-day hospital readmission for a set of heart failure-specific visits to 13 hospitals from 2008 to 2012. Laboratory test results were extracted and then manually cleaned and mapped to LOINC. We extracted features to summarize laboratory data for each patient and used a training dataset (2008–2011) to learn models using a variety of feature selection techniques and classifiers. We evaluated our hypothesis by comparing model performance on an independent test dataset (2012). RESULTS: Models that utilized LOINC performed significantly better than models that utilized local laboratory test codes, regardless of the feature selection technique and classifier approach used. DISCUSSION AND CONCLUSION: We quantitatively demonstrated the positive impact of standardizing multi-site laboratory data to LOINC prior to use in predictive models. We used our findings to argue for the need for detailed reporting of data standardization procedures in predictive modeling, especially in studies leveraging multi-site datasets extracted from electronic health records.
format Online
Article
Text
id pubmed-6435008
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-64350082019-04-01 An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance Barda, Amie J Ruiz, Victor M Gigliotti, Tony Tsui, Fuchiang (Rich) JAMIA Open Research and Applications OBJECTIVES: We aimed to gain a better understanding of how standardization of laboratory data can impact predictive model performance in multi-site datasets. We hypothesized that standardizing local laboratory codes to logical observation identifiers names and codes (LOINC) would produce predictive models that significantly outperform those learned utilizing local laboratory codes. MATERIALS AND METHODS: We predicted 30-day hospital readmission for a set of heart failure-specific visits to 13 hospitals from 2008 to 2012. Laboratory test results were extracted and then manually cleaned and mapped to LOINC. We extracted features to summarize laboratory data for each patient and used a training dataset (2008–2011) to learn models using a variety of feature selection techniques and classifiers. We evaluated our hypothesis by comparing model performance on an independent test dataset (2012). RESULTS: Models that utilized LOINC performed significantly better than models that utilized local laboratory test codes, regardless of the feature selection technique and classifier approach used. DISCUSSION AND CONCLUSION: We quantitatively demonstrated the positive impact of standardizing multi-site laboratory data to LOINC prior to use in predictive models. We used our findings to argue for the need for detailed reporting of data standardization procedures in predictive modeling, especially in studies leveraging multi-site datasets extracted from electronic health records. Oxford University Press 2019-02-04 /pmc/articles/PMC6435008/ /pubmed/30944914 http://dx.doi.org/10.1093/jamiaopen/ooy063 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research and Applications
Barda, Amie J
Ruiz, Victor M
Gigliotti, Tony
Tsui, Fuchiang (Rich)
An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance
title An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance
title_full An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance
title_fullStr An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance
title_full_unstemmed An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance
title_short An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance
title_sort argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of loinc standardization on model performance
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6435008/
https://www.ncbi.nlm.nih.gov/pubmed/30944914
http://dx.doi.org/10.1093/jamiaopen/ooy063
work_keys_str_mv AT bardaamiej anargumentforreportingdatastandardizationproceduresinmultisitepredictivemodelingcasestudyontheimpactofloincstandardizationonmodelperformance
AT ruizvictorm anargumentforreportingdatastandardizationproceduresinmultisitepredictivemodelingcasestudyontheimpactofloincstandardizationonmodelperformance
AT gigliottitony anargumentforreportingdatastandardizationproceduresinmultisitepredictivemodelingcasestudyontheimpactofloincstandardizationonmodelperformance
AT tsuifuchiangrich anargumentforreportingdatastandardizationproceduresinmultisitepredictivemodelingcasestudyontheimpactofloincstandardizationonmodelperformance
AT bardaamiej argumentforreportingdatastandardizationproceduresinmultisitepredictivemodelingcasestudyontheimpactofloincstandardizationonmodelperformance
AT ruizvictorm argumentforreportingdatastandardizationproceduresinmultisitepredictivemodelingcasestudyontheimpactofloincstandardizationonmodelperformance
AT gigliottitony argumentforreportingdatastandardizationproceduresinmultisitepredictivemodelingcasestudyontheimpactofloincstandardizationonmodelperformance
AT tsuifuchiangrich argumentforreportingdatastandardizationproceduresinmultisitepredictivemodelingcasestudyontheimpactofloincstandardizationonmodelperformance