Cargando…
A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications
The quality of data used for QSAR model derivation is extremely important as it strongly affects the final robustness and predictive power of the model. Ambiguous or wrong structures need to be carefully checked, because they lead to errors in calculation of descriptors, hence leading to meaningless...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6503381/ https://www.ncbi.nlm.nih.gov/pubmed/30536051 http://dx.doi.org/10.1186/s13321-018-0315-6 |
_version_ | 1783416399094349824 |
---|---|
author | Gadaleta, Domenico Lombardo, Anna Toma, Cosimo Benfenati, Emilio |
author_facet | Gadaleta, Domenico Lombardo, Anna Toma, Cosimo Benfenati, Emilio |
author_sort | Gadaleta, Domenico |
collection | PubMed |
description | The quality of data used for QSAR model derivation is extremely important as it strongly affects the final robustness and predictive power of the model. Ambiguous or wrong structures need to be carefully checked, because they lead to errors in calculation of descriptors, hence leading to meaningless results. The increasing amounts of data, however, have often made it hard to check of very large databases manually. In the light of this, we designed and implemented a semi-automated workflow integrating structural data retrieval from several web-based databases, automated comparison of these data, chemical structure cleaning, selection and standardization of data into a consistent, ready-to-use format that can be employed for modeling. The workflow integrates best practices for data curation that have been suggested in the recent literature. The workflow has been implemented with the freely available KNIME software and is freely available to the cheminformatics community for improvement and application to a broad range of chemical datasets. [Image: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13321-018-0315-6) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6503381 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-65033812019-05-10 A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications Gadaleta, Domenico Lombardo, Anna Toma, Cosimo Benfenati, Emilio J Cheminform Research Article The quality of data used for QSAR model derivation is extremely important as it strongly affects the final robustness and predictive power of the model. Ambiguous or wrong structures need to be carefully checked, because they lead to errors in calculation of descriptors, hence leading to meaningless results. The increasing amounts of data, however, have often made it hard to check of very large databases manually. In the light of this, we designed and implemented a semi-automated workflow integrating structural data retrieval from several web-based databases, automated comparison of these data, chemical structure cleaning, selection and standardization of data into a consistent, ready-to-use format that can be employed for modeling. The workflow integrates best practices for data curation that have been suggested in the recent literature. The workflow has been implemented with the freely available KNIME software and is freely available to the cheminformatics community for improvement and application to a broad range of chemical datasets. [Image: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13321-018-0315-6) contains supplementary material, which is available to authorized users. Springer International Publishing 2018-12-10 /pmc/articles/PMC6503381/ /pubmed/30536051 http://dx.doi.org/10.1186/s13321-018-0315-6 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Gadaleta, Domenico Lombardo, Anna Toma, Cosimo Benfenati, Emilio A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications |
title | A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications |
title_full | A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications |
title_fullStr | A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications |
title_full_unstemmed | A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications |
title_short | A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications |
title_sort | new semi-automated workflow for chemical data retrieval and quality checking for modeling applications |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6503381/ https://www.ncbi.nlm.nih.gov/pubmed/30536051 http://dx.doi.org/10.1186/s13321-018-0315-6 |
work_keys_str_mv | AT gadaletadomenico anewsemiautomatedworkflowforchemicaldataretrievalandqualitycheckingformodelingapplications AT lombardoanna anewsemiautomatedworkflowforchemicaldataretrievalandqualitycheckingformodelingapplications AT tomacosimo anewsemiautomatedworkflowforchemicaldataretrievalandqualitycheckingformodelingapplications AT benfenatiemilio anewsemiautomatedworkflowforchemicaldataretrievalandqualitycheckingformodelingapplications AT gadaletadomenico newsemiautomatedworkflowforchemicaldataretrievalandqualitycheckingformodelingapplications AT lombardoanna newsemiautomatedworkflowforchemicaldataretrievalandqualitycheckingformodelingapplications AT tomacosimo newsemiautomatedworkflowforchemicaldataretrievalandqualitycheckingformodelingapplications AT benfenatiemilio newsemiautomatedworkflowforchemicaldataretrievalandqualitycheckingformodelingapplications |