Cargando…

Imputation of missing values for cochlear implant candidate audiometric data and potential applications

OBJECTIVE: Assess the real-world performance of popular imputation algorithms on cochlear implant (CI) candidate audiometric data. METHODS: 7,451 audiograms from patients undergoing CI candidacy evaluation were pooled from 32 institutions with complete case analysis yielding 1,304 audiograms. Imputa...

Descripción completa

Detalles Bibliográficos
Autores principales: Pavelchek, Cole, Michelson, Andrew P., Walia, Amit, Ortmann, Amanda, Herzog, Jacques, Buchman, Craig A., Shew, Matthew A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9901781/
https://www.ncbi.nlm.nih.gov/pubmed/36745652
http://dx.doi.org/10.1371/journal.pone.0281337
_version_ 1784883094691512320
author Pavelchek, Cole
Michelson, Andrew P.
Walia, Amit
Ortmann, Amanda
Herzog, Jacques
Buchman, Craig A.
Shew, Matthew A.
author_facet Pavelchek, Cole
Michelson, Andrew P.
Walia, Amit
Ortmann, Amanda
Herzog, Jacques
Buchman, Craig A.
Shew, Matthew A.
author_sort Pavelchek, Cole
collection PubMed
description OBJECTIVE: Assess the real-world performance of popular imputation algorithms on cochlear implant (CI) candidate audiometric data. METHODS: 7,451 audiograms from patients undergoing CI candidacy evaluation were pooled from 32 institutions with complete case analysis yielding 1,304 audiograms. Imputation model performance was assessed with nested cross-validation on randomly generated sparse datasets with various amounts of missing data, distributions of sparsity, and dataset sizes. A threshold for safe imputation was defined as root mean square error (RMSE) <10dB. Models included univariate imputation, interpolation, multiple imputation by chained equations (MICE), k-nearest neighbors, gradient boosted trees, and neural networks. RESULTS: Greater quantities of missing data were associated with worse performance. Sparsity in audiometric data is not uniformly distributed, as inter-octave frequencies are less commonly tested. With 3–8 missing features per instance, a real-world sparsity distribution was associated with significantly better performance compared to other sparsity distributions (Δ RMSE 0.3 dB– 5.8 dB, non-overlapping 99% confidence intervals). With a real-world sparsity distribution, models were able to safely impute up to 6 missing datapoints in an 11-frequency audiogram. MICE consistently outperformed other models across all metrics and sparsity distributions (p < 0.01, Wilcoxon rank sum test). With sparsity capped at 6 missing features per audiogram but otherwise equivalent to the raw dataset, MICE imputed with RMSE of 7.83 dB [95% CI 7.81–7.86]. Imputing up to 6 missing features captures 99.3% of the audiograms in our dataset, allowing for a 5.7-fold increase in dataset size (1,304 to 7,399 audiograms) as compared with complete case analysis. CONCLUSION: Precision medicine will inevitably play an integral role in the future of hearing healthcare. These methods are data dependent, and rigorously validated imputation models are a key tool for maximizing datasets. Using the largest CI audiogram dataset to-date, we demonstrate that in a real-world scenario MICE can safely impute missing data for the vast majority (>99%) of audiograms with RMSE well below a clinically significant threshold of 10dB. Evaluation across a range of dataset sizes and sparsity distributions suggests a high degree of generalizability to future applications.
format Online
Article
Text
id pubmed-9901781
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-99017812023-02-07 Imputation of missing values for cochlear implant candidate audiometric data and potential applications Pavelchek, Cole Michelson, Andrew P. Walia, Amit Ortmann, Amanda Herzog, Jacques Buchman, Craig A. Shew, Matthew A. PLoS One Research Article OBJECTIVE: Assess the real-world performance of popular imputation algorithms on cochlear implant (CI) candidate audiometric data. METHODS: 7,451 audiograms from patients undergoing CI candidacy evaluation were pooled from 32 institutions with complete case analysis yielding 1,304 audiograms. Imputation model performance was assessed with nested cross-validation on randomly generated sparse datasets with various amounts of missing data, distributions of sparsity, and dataset sizes. A threshold for safe imputation was defined as root mean square error (RMSE) <10dB. Models included univariate imputation, interpolation, multiple imputation by chained equations (MICE), k-nearest neighbors, gradient boosted trees, and neural networks. RESULTS: Greater quantities of missing data were associated with worse performance. Sparsity in audiometric data is not uniformly distributed, as inter-octave frequencies are less commonly tested. With 3–8 missing features per instance, a real-world sparsity distribution was associated with significantly better performance compared to other sparsity distributions (Δ RMSE 0.3 dB– 5.8 dB, non-overlapping 99% confidence intervals). With a real-world sparsity distribution, models were able to safely impute up to 6 missing datapoints in an 11-frequency audiogram. MICE consistently outperformed other models across all metrics and sparsity distributions (p < 0.01, Wilcoxon rank sum test). With sparsity capped at 6 missing features per audiogram but otherwise equivalent to the raw dataset, MICE imputed with RMSE of 7.83 dB [95% CI 7.81–7.86]. Imputing up to 6 missing features captures 99.3% of the audiograms in our dataset, allowing for a 5.7-fold increase in dataset size (1,304 to 7,399 audiograms) as compared with complete case analysis. CONCLUSION: Precision medicine will inevitably play an integral role in the future of hearing healthcare. These methods are data dependent, and rigorously validated imputation models are a key tool for maximizing datasets. Using the largest CI audiogram dataset to-date, we demonstrate that in a real-world scenario MICE can safely impute missing data for the vast majority (>99%) of audiograms with RMSE well below a clinically significant threshold of 10dB. Evaluation across a range of dataset sizes and sparsity distributions suggests a high degree of generalizability to future applications. Public Library of Science 2023-02-06 /pmc/articles/PMC9901781/ /pubmed/36745652 http://dx.doi.org/10.1371/journal.pone.0281337 Text en © 2023 Pavelchek et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Pavelchek, Cole
Michelson, Andrew P.
Walia, Amit
Ortmann, Amanda
Herzog, Jacques
Buchman, Craig A.
Shew, Matthew A.
Imputation of missing values for cochlear implant candidate audiometric data and potential applications
title Imputation of missing values for cochlear implant candidate audiometric data and potential applications
title_full Imputation of missing values for cochlear implant candidate audiometric data and potential applications
title_fullStr Imputation of missing values for cochlear implant candidate audiometric data and potential applications
title_full_unstemmed Imputation of missing values for cochlear implant candidate audiometric data and potential applications
title_short Imputation of missing values for cochlear implant candidate audiometric data and potential applications
title_sort imputation of missing values for cochlear implant candidate audiometric data and potential applications
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9901781/
https://www.ncbi.nlm.nih.gov/pubmed/36745652
http://dx.doi.org/10.1371/journal.pone.0281337
work_keys_str_mv AT pavelchekcole imputationofmissingvaluesforcochlearimplantcandidateaudiometricdataandpotentialapplications
AT michelsonandrewp imputationofmissingvaluesforcochlearimplantcandidateaudiometricdataandpotentialapplications
AT waliaamit imputationofmissingvaluesforcochlearimplantcandidateaudiometricdataandpotentialapplications
AT ortmannamanda imputationofmissingvaluesforcochlearimplantcandidateaudiometricdataandpotentialapplications
AT herzogjacques imputationofmissingvaluesforcochlearimplantcandidateaudiometricdataandpotentialapplications
AT buchmancraiga imputationofmissingvaluesforcochlearimplantcandidateaudiometricdataandpotentialapplications
AT shewmatthewa imputationofmissingvaluesforcochlearimplantcandidateaudiometricdataandpotentialapplications