Cargando…
An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance
OBJECTIVES: We aimed to gain a better understanding of how standardization of laboratory data can impact predictive model performance in multi-site datasets. We hypothesized that standardizing local laboratory codes to logical observation identifiers names and codes (LOINC) would produce predictive...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6435008/ https://www.ncbi.nlm.nih.gov/pubmed/30944914 http://dx.doi.org/10.1093/jamiaopen/ooy063 |
_version_ | 1783406582464249856 |
---|---|
author | Barda, Amie J Ruiz, Victor M Gigliotti, Tony Tsui, Fuchiang (Rich) |
author_facet | Barda, Amie J Ruiz, Victor M Gigliotti, Tony Tsui, Fuchiang (Rich) |
author_sort | Barda, Amie J |
collection | PubMed |
description | OBJECTIVES: We aimed to gain a better understanding of how standardization of laboratory data can impact predictive model performance in multi-site datasets. We hypothesized that standardizing local laboratory codes to logical observation identifiers names and codes (LOINC) would produce predictive models that significantly outperform those learned utilizing local laboratory codes. MATERIALS AND METHODS: We predicted 30-day hospital readmission for a set of heart failure-specific visits to 13 hospitals from 2008 to 2012. Laboratory test results were extracted and then manually cleaned and mapped to LOINC. We extracted features to summarize laboratory data for each patient and used a training dataset (2008–2011) to learn models using a variety of feature selection techniques and classifiers. We evaluated our hypothesis by comparing model performance on an independent test dataset (2012). RESULTS: Models that utilized LOINC performed significantly better than models that utilized local laboratory test codes, regardless of the feature selection technique and classifier approach used. DISCUSSION AND CONCLUSION: We quantitatively demonstrated the positive impact of standardizing multi-site laboratory data to LOINC prior to use in predictive models. We used our findings to argue for the need for detailed reporting of data standardization procedures in predictive modeling, especially in studies leveraging multi-site datasets extracted from electronic health records. |
format | Online Article Text |
id | pubmed-6435008 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-64350082019-04-01 An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance Barda, Amie J Ruiz, Victor M Gigliotti, Tony Tsui, Fuchiang (Rich) JAMIA Open Research and Applications OBJECTIVES: We aimed to gain a better understanding of how standardization of laboratory data can impact predictive model performance in multi-site datasets. We hypothesized that standardizing local laboratory codes to logical observation identifiers names and codes (LOINC) would produce predictive models that significantly outperform those learned utilizing local laboratory codes. MATERIALS AND METHODS: We predicted 30-day hospital readmission for a set of heart failure-specific visits to 13 hospitals from 2008 to 2012. Laboratory test results were extracted and then manually cleaned and mapped to LOINC. We extracted features to summarize laboratory data for each patient and used a training dataset (2008–2011) to learn models using a variety of feature selection techniques and classifiers. We evaluated our hypothesis by comparing model performance on an independent test dataset (2012). RESULTS: Models that utilized LOINC performed significantly better than models that utilized local laboratory test codes, regardless of the feature selection technique and classifier approach used. DISCUSSION AND CONCLUSION: We quantitatively demonstrated the positive impact of standardizing multi-site laboratory data to LOINC prior to use in predictive models. We used our findings to argue for the need for detailed reporting of data standardization procedures in predictive modeling, especially in studies leveraging multi-site datasets extracted from electronic health records. Oxford University Press 2019-02-04 /pmc/articles/PMC6435008/ /pubmed/30944914 http://dx.doi.org/10.1093/jamiaopen/ooy063 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Research and Applications Barda, Amie J Ruiz, Victor M Gigliotti, Tony Tsui, Fuchiang (Rich) An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance |
title | An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance |
title_full | An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance |
title_fullStr | An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance |
title_full_unstemmed | An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance |
title_short | An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance |
title_sort | argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of loinc standardization on model performance |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6435008/ https://www.ncbi.nlm.nih.gov/pubmed/30944914 http://dx.doi.org/10.1093/jamiaopen/ooy063 |
work_keys_str_mv | AT bardaamiej anargumentforreportingdatastandardizationproceduresinmultisitepredictivemodelingcasestudyontheimpactofloincstandardizationonmodelperformance AT ruizvictorm anargumentforreportingdatastandardizationproceduresinmultisitepredictivemodelingcasestudyontheimpactofloincstandardizationonmodelperformance AT gigliottitony anargumentforreportingdatastandardizationproceduresinmultisitepredictivemodelingcasestudyontheimpactofloincstandardizationonmodelperformance AT tsuifuchiangrich anargumentforreportingdatastandardizationproceduresinmultisitepredictivemodelingcasestudyontheimpactofloincstandardizationonmodelperformance AT bardaamiej argumentforreportingdatastandardizationproceduresinmultisitepredictivemodelingcasestudyontheimpactofloincstandardizationonmodelperformance AT ruizvictorm argumentforreportingdatastandardizationproceduresinmultisitepredictivemodelingcasestudyontheimpactofloincstandardizationonmodelperformance AT gigliottitony argumentforreportingdatastandardizationproceduresinmultisitepredictivemodelingcasestudyontheimpactofloincstandardizationonmodelperformance AT tsuifuchiangrich argumentforreportingdatastandardizationproceduresinmultisitepredictivemodelingcasestudyontheimpactofloincstandardizationonmodelperformance |