Cargando…

Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability

BACKGROUND: Even though circular fingerprints have been first introduced more than 50 years ago, they are still widely used for building highly predictive, state-of-the-art (Q)SAR models. Historically, these structural fragments were designed to search large molecular databases. Hence, to derive a c...

Descripción completa

Detalles Bibliográficos
Autores principales: Gütlein, Martin, Kramer, Stefan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5088672/
https://www.ncbi.nlm.nih.gov/pubmed/27853484
http://dx.doi.org/10.1186/s13321-016-0173-z
_version_ 1782464143499984896
author Gütlein, Martin
Kramer, Stefan
author_facet Gütlein, Martin
Kramer, Stefan
author_sort Gütlein, Martin
collection PubMed
description BACKGROUND: Even though circular fingerprints have been first introduced more than 50 years ago, they are still widely used for building highly predictive, state-of-the-art (Q)SAR models. Historically, these structural fragments were designed to search large molecular databases. Hence, to derive a compact representation, circular fingerprint fragments are often folded to comparatively short bit-strings. However, folding fingerprints introduces bit collisions, and therefore adds noise to the encoded structural information and removes its interpretability. Both representations, folded as well as unprocessed fingerprints, are often used for (Q)SAR modeling. RESULTS: We show that it can be preferable to build (Q)SAR models with circular fingerprint fragments that have been filtered by supervised feature selection, instead of applying folded or all fragments. Compared to folded fingerprints, filtered fingerprints significantly increase predictive performance and remain unambiguous and interpretable. Compared to unprocessed fingerprints, filtered fingerprints reduce the computational effort and are a more compact and less redundant feature representation. Depending on the selected learning algorithm filtering yields about equally predictive (Q)SAR models. We demonstrate the suitability of filtered fingerprints for (Q)SAR modeling by presenting our freely available web service Collision-free Filtered Circular Fingerprints that provides rationales for predictions by highlighting important structural features in the query compound (see http://coffer.informatik.uni-mainz.de). CONCLUSIONS: Circular fingerprints are potent structural features that yield highly predictive models and encode interpretable structural information. However, to not lose interpretability, circular fingerprints should not be folded when building prediction models. Our experiments show that filtering is a suitable option to reduce the high computational effort when working with all fingerprint fragments. Additionally, our experiments suggest that the area under precision recall curve is a more sensible statistic for validating (Q)SAR models for virtual screening than the area under ROC or other measures for early recognition. GRAPHICAL ABSTRACT: [Image: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0173-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5088672
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-50886722016-11-16 Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability Gütlein, Martin Kramer, Stefan J Cheminform Research Article BACKGROUND: Even though circular fingerprints have been first introduced more than 50 years ago, they are still widely used for building highly predictive, state-of-the-art (Q)SAR models. Historically, these structural fragments were designed to search large molecular databases. Hence, to derive a compact representation, circular fingerprint fragments are often folded to comparatively short bit-strings. However, folding fingerprints introduces bit collisions, and therefore adds noise to the encoded structural information and removes its interpretability. Both representations, folded as well as unprocessed fingerprints, are often used for (Q)SAR modeling. RESULTS: We show that it can be preferable to build (Q)SAR models with circular fingerprint fragments that have been filtered by supervised feature selection, instead of applying folded or all fragments. Compared to folded fingerprints, filtered fingerprints significantly increase predictive performance and remain unambiguous and interpretable. Compared to unprocessed fingerprints, filtered fingerprints reduce the computational effort and are a more compact and less redundant feature representation. Depending on the selected learning algorithm filtering yields about equally predictive (Q)SAR models. We demonstrate the suitability of filtered fingerprints for (Q)SAR modeling by presenting our freely available web service Collision-free Filtered Circular Fingerprints that provides rationales for predictions by highlighting important structural features in the query compound (see http://coffer.informatik.uni-mainz.de). CONCLUSIONS: Circular fingerprints are potent structural features that yield highly predictive models and encode interpretable structural information. However, to not lose interpretability, circular fingerprints should not be folded when building prediction models. Our experiments show that filtering is a suitable option to reduce the high computational effort when working with all fingerprint fragments. Additionally, our experiments suggest that the area under precision recall curve is a more sensible statistic for validating (Q)SAR models for virtual screening than the area under ROC or other measures for early recognition. GRAPHICAL ABSTRACT: [Image: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0173-z) contains supplementary material, which is available to authorized users. Springer International Publishing 2016-10-31 /pmc/articles/PMC5088672/ /pubmed/27853484 http://dx.doi.org/10.1186/s13321-016-0173-z Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Gütlein, Martin
Kramer, Stefan
Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability
title Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability
title_full Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability
title_fullStr Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability
title_full_unstemmed Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability
title_short Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability
title_sort filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5088672/
https://www.ncbi.nlm.nih.gov/pubmed/27853484
http://dx.doi.org/10.1186/s13321-016-0173-z
work_keys_str_mv AT gutleinmartin filteredcircularfingerprintsimproveeitherpredictionorruntimeperformancewhileretaininginterpretability
AT kramerstefan filteredcircularfingerprintsimproveeitherpredictionorruntimeperformancewhileretaininginterpretability