Cargando…

Characterizing the effects of missing data and evaluating imputation methods for chemical prioritization applications using ToxPi

BACKGROUND: The Toxicological Priority Index (ToxPi) is a method for prioritization and profiling of chemicals that integrates data from diverse sources. However, individual data sources (“assays”), such as in vitro bioassays or in vivo study endpoints, often feature sections of missing data, wherei...

Descripción completa

Detalles Bibliográficos
Autores principales:	To, Kimberly T., Fry, Rebecca C., Reif, David M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998548/ https://www.ncbi.nlm.nih.gov/pubmed/29942350 http://dx.doi.org/10.1186/s13040-018-0169-5

_version_	1783331250210078720
author	To, Kimberly T. Fry, Rebecca C. Reif, David M.
author_facet	To, Kimberly T. Fry, Rebecca C. Reif, David M.
author_sort	To, Kimberly T.
collection	PubMed
description	BACKGROUND: The Toxicological Priority Index (ToxPi) is a method for prioritization and profiling of chemicals that integrates data from diverse sources. However, individual data sources (“assays”), such as in vitro bioassays or in vivo study endpoints, often feature sections of missing data, wherein subsets of chemicals have not been tested in all assays. In order to investigate the effects of missing data and recommend solutions, we designed simulation studies around high-throughput screening data generated by the ToxCast and Tox21 programs on chemicals highlighted by the Agency for Toxic Substances and Disease Registry’s (ATSDR) Substance Priority List (SPL), which helps prioritize environmental research and remediation resources. RESULTS: Our simulations explored a wide range of scenarios concerning data (0-80% assay data missing per chemical), modeling (ToxPi models containing from 160-700 different assays), and imputation method (k-Nearest-Neighbor, Max, Mean, Min, Binomial, Local Least Squares, and Singular Value Decomposition). We find that most imputation methods result in significant changes to ToxPi score, except for datasets with a small number of assays. If we consider rank change conditional on these significant changes to ToxPi score, we find that ranks of chemicals in the minimum value imputation, SVD imputation, and kNN imputation sets are more sensitive to the score changes. CONCLUSIONS: We found that the choice of imputation strategy exerted significant influence over both scores and associated ranks, and the most sensitive scenarios were those involving fewer assays plus higher proportions of missing data. By characterizing the effects of missing data and the relative benefit of imputation approaches across real-world data scenarios, we can augment confidence in the robustness of decisions regarding the health and ecological effects of environmental chemicals
format	Online Article Text
id	pubmed-5998548
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-59985482018-06-25 Characterizing the effects of missing data and evaluating imputation methods for chemical prioritization applications using ToxPi To, Kimberly T. Fry, Rebecca C. Reif, David M. BioData Min Research BACKGROUND: The Toxicological Priority Index (ToxPi) is a method for prioritization and profiling of chemicals that integrates data from diverse sources. However, individual data sources (“assays”), such as in vitro bioassays or in vivo study endpoints, often feature sections of missing data, wherein subsets of chemicals have not been tested in all assays. In order to investigate the effects of missing data and recommend solutions, we designed simulation studies around high-throughput screening data generated by the ToxCast and Tox21 programs on chemicals highlighted by the Agency for Toxic Substances and Disease Registry’s (ATSDR) Substance Priority List (SPL), which helps prioritize environmental research and remediation resources. RESULTS: Our simulations explored a wide range of scenarios concerning data (0-80% assay data missing per chemical), modeling (ToxPi models containing from 160-700 different assays), and imputation method (k-Nearest-Neighbor, Max, Mean, Min, Binomial, Local Least Squares, and Singular Value Decomposition). We find that most imputation methods result in significant changes to ToxPi score, except for datasets with a small number of assays. If we consider rank change conditional on these significant changes to ToxPi score, we find that ranks of chemicals in the minimum value imputation, SVD imputation, and kNN imputation sets are more sensitive to the score changes. CONCLUSIONS: We found that the choice of imputation strategy exerted significant influence over both scores and associated ranks, and the most sensitive scenarios were those involving fewer assays plus higher proportions of missing data. By characterizing the effects of missing data and the relative benefit of imputation approaches across real-world data scenarios, we can augment confidence in the robustness of decisions regarding the health and ecological effects of environmental chemicals BioMed Central 2018-06-13 /pmc/articles/PMC5998548/ /pubmed/29942350 http://dx.doi.org/10.1186/s13040-018-0169-5 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research To, Kimberly T. Fry, Rebecca C. Reif, David M. Characterizing the effects of missing data and evaluating imputation methods for chemical prioritization applications using ToxPi
title	Characterizing the effects of missing data and evaluating imputation methods for chemical prioritization applications using ToxPi
title_full	Characterizing the effects of missing data and evaluating imputation methods for chemical prioritization applications using ToxPi
title_fullStr	Characterizing the effects of missing data and evaluating imputation methods for chemical prioritization applications using ToxPi
title_full_unstemmed	Characterizing the effects of missing data and evaluating imputation methods for chemical prioritization applications using ToxPi
title_short	Characterizing the effects of missing data and evaluating imputation methods for chemical prioritization applications using ToxPi
title_sort	characterizing the effects of missing data and evaluating imputation methods for chemical prioritization applications using toxpi
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998548/ https://www.ncbi.nlm.nih.gov/pubmed/29942350 http://dx.doi.org/10.1186/s13040-018-0169-5
work_keys_str_mv	AT tokimberlyt characterizingtheeffectsofmissingdataandevaluatingimputationmethodsforchemicalprioritizationapplicationsusingtoxpi AT fryrebeccac characterizingtheeffectsofmissingdataandevaluatingimputationmethodsforchemicalprioritizationapplicationsusingtoxpi AT reifdavidm characterizingtheeffectsofmissingdataandevaluatingimputationmethodsforchemicalprioritizationapplicationsusingtoxpi

Characterizing the effects of missing data and evaluating imputation methods for chemical prioritization applications using ToxPi

Ejemplares similares