Cargando…

Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity

BACKGROUND: A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Webb, Samuel J, Hanser, Thierry, Howlin, Brendan, Krause, Paul, Vessey, Jonathan D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3997921/
https://www.ncbi.nlm.nih.gov/pubmed/24661325
http://dx.doi.org/10.1186/1758-2946-6-8
_version_ 1782313261008420864
author Webb, Samuel J
Hanser, Thierry
Howlin, Brendan
Krause, Paul
Vessey, Jonathan D
author_facet Webb, Samuel J
Hanser, Thierry
Howlin, Brendan
Krause, Paul
Vessey, Jonathan D
author_sort Webb, Samuel J
collection PubMed
description BACKGROUND: A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints. A fragmentation algorithm is utilised to investigate the model’s behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model’s behaviour for the specific query. RESULTS: Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. CONCLUSION: This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development.
format Online
Article
Text
id pubmed-3997921
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39979212014-05-08 Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity Webb, Samuel J Hanser, Thierry Howlin, Brendan Krause, Paul Vessey, Jonathan D J Cheminform Research Article BACKGROUND: A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints. A fragmentation algorithm is utilised to investigate the model’s behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model’s behaviour for the specific query. RESULTS: Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. CONCLUSION: This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development. BioMed Central 2014-03-25 /pmc/articles/PMC3997921/ /pubmed/24661325 http://dx.doi.org/10.1186/1758-2946-6-8 Text en Copyright © 2014 Webb et al.; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Webb, Samuel J
Hanser, Thierry
Howlin, Brendan
Krause, Paul
Vessey, Jonathan D
Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity
title Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity
title_full Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity
title_fullStr Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity
title_full_unstemmed Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity
title_short Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity
title_sort feature combination networks for the interpretation of statistical machine learning models: application to ames mutagenicity
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3997921/
https://www.ncbi.nlm.nih.gov/pubmed/24661325
http://dx.doi.org/10.1186/1758-2946-6-8
work_keys_str_mv AT webbsamuelj featurecombinationnetworksfortheinterpretationofstatisticalmachinelearningmodelsapplicationtoamesmutagenicity
AT hanserthierry featurecombinationnetworksfortheinterpretationofstatisticalmachinelearningmodelsapplicationtoamesmutagenicity
AT howlinbrendan featurecombinationnetworksfortheinterpretationofstatisticalmachinelearningmodelsapplicationtoamesmutagenicity
AT krausepaul featurecombinationnetworksfortheinterpretationofstatisticalmachinelearningmodelsapplicationtoamesmutagenicity
AT vesseyjonathand featurecombinationnetworksfortheinterpretationofstatisticalmachinelearningmodelsapplicationtoamesmutagenicity