Cargando…

Experimental Errors in QSAR Modeling Sets: What We Can Do and What We Cannot Do

[Image: see text] Numerous chemical data sets have become available for quantitative structure–activity relationship (QSAR) modeling studies. However, the quality of different data sources may be different based on the nature of experimental protocols. Therefore, potential experimental errors in the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhao, Linlin, Wang, Wenyi, Sedykh, Alexander, Zhu, Hao
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Chemical Society 2017
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5494643/ https://www.ncbi.nlm.nih.gov/pubmed/28691113 http://dx.doi.org/10.1021/acsomega.7b00274

_version_	1783247705827442688
author	Zhao, Linlin Wang, Wenyi Sedykh, Alexander Zhu, Hao
author_facet	Zhao, Linlin Wang, Wenyi Sedykh, Alexander Zhu, Hao
author_sort	Zhao, Linlin
collection	PubMed
description	[Image: see text] Numerous chemical data sets have become available for quantitative structure–activity relationship (QSAR) modeling studies. However, the quality of different data sources may be different based on the nature of experimental protocols. Therefore, potential experimental errors in the modeling sets may lead to the development of poor QSAR models and further affect the predictions of new compounds. In this study, we explored the relationship between the ratio of questionable data in the modeling sets, which was obtained by simulating experimental errors, and the QSAR modeling performance. To this end, we used eight data sets (four continuous endpoints and four categorical endpoints) that have been extensively curated both in-house and by our collaborators to create over 1800 various QSAR models. Each data set was duplicated to create several new modeling sets with different ratios of simulated experimental errors (i.e., randomizing the activities of part of the compounds) in the modeling process. A fivefold cross-validation process was used to evaluate the modeling performance, which deteriorates when the ratio of experimental errors increases. All of the resulting models were also used to predict external sets of new compounds, which were excluded at the beginning of the modeling process. The modeling results showed that the compounds with relatively large prediction errors in cross-validation processes are likely to be those with simulated experimental errors. However, after removing a certain number of compounds with large prediction errors in the cross-validation process, the external predictions of new compounds did not show improvement. Our conclusion is that the QSAR predictions, especially consensus predictions, can identify compounds with potential experimental errors. But removing those compounds by the cross-validation procedure is not a reasonable means to improve model predictivity due to overfitting.
format	Online Article Text
id	pubmed-5494643
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	American Chemical Society
record_format	MEDLINE/PubMed
spelling	pubmed-54946432017-07-05 Experimental Errors in QSAR Modeling Sets: What We Can Do and What We Cannot Do Zhao, Linlin Wang, Wenyi Sedykh, Alexander Zhu, Hao ACS Omega [Image: see text] Numerous chemical data sets have become available for quantitative structure–activity relationship (QSAR) modeling studies. However, the quality of different data sources may be different based on the nature of experimental protocols. Therefore, potential experimental errors in the modeling sets may lead to the development of poor QSAR models and further affect the predictions of new compounds. In this study, we explored the relationship between the ratio of questionable data in the modeling sets, which was obtained by simulating experimental errors, and the QSAR modeling performance. To this end, we used eight data sets (four continuous endpoints and four categorical endpoints) that have been extensively curated both in-house and by our collaborators to create over 1800 various QSAR models. Each data set was duplicated to create several new modeling sets with different ratios of simulated experimental errors (i.e., randomizing the activities of part of the compounds) in the modeling process. A fivefold cross-validation process was used to evaluate the modeling performance, which deteriorates when the ratio of experimental errors increases. All of the resulting models were also used to predict external sets of new compounds, which were excluded at the beginning of the modeling process. The modeling results showed that the compounds with relatively large prediction errors in cross-validation processes are likely to be those with simulated experimental errors. However, after removing a certain number of compounds with large prediction errors in the cross-validation process, the external predictions of new compounds did not show improvement. Our conclusion is that the QSAR predictions, especially consensus predictions, can identify compounds with potential experimental errors. But removing those compounds by the cross-validation procedure is not a reasonable means to improve model predictivity due to overfitting. American Chemical Society 2017-06-19 /pmc/articles/PMC5494643/ /pubmed/28691113 http://dx.doi.org/10.1021/acsomega.7b00274 Text en Copyright © 2017 American Chemical Society This is an open access article published under an ACS AuthorChoice License (http://pubs.acs.org/page/policy/authorchoice_termsofuse.html) , which permits copying and redistribution of the article or any adaptations for non-commercial purposes.
spellingShingle	Zhao, Linlin Wang, Wenyi Sedykh, Alexander Zhu, Hao Experimental Errors in QSAR Modeling Sets: What We Can Do and What We Cannot Do
title	Experimental Errors in QSAR Modeling Sets: What We Can Do and What We Cannot Do
title_full	Experimental Errors in QSAR Modeling Sets: What We Can Do and What We Cannot Do
title_fullStr	Experimental Errors in QSAR Modeling Sets: What We Can Do and What We Cannot Do
title_full_unstemmed	Experimental Errors in QSAR Modeling Sets: What We Can Do and What We Cannot Do
title_short	Experimental Errors in QSAR Modeling Sets: What We Can Do and What We Cannot Do
title_sort	experimental errors in qsar modeling sets: what we can do and what we cannot do
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5494643/ https://www.ncbi.nlm.nih.gov/pubmed/28691113 http://dx.doi.org/10.1021/acsomega.7b00274
work_keys_str_mv	AT zhaolinlin experimentalerrorsinqsarmodelingsetswhatwecandoandwhatwecannotdo AT wangwenyi experimentalerrorsinqsarmodelingsetswhatwecandoandwhatwecannotdo AT sedykhalexander experimentalerrorsinqsarmodelingsetswhatwecandoandwhatwecannotdo AT zhuhao experimentalerrorsinqsarmodelingsetswhatwecandoandwhatwecannotdo

Experimental Errors in QSAR Modeling Sets: What We Can Do and What We Cannot Do

Ejemplares similares