Cargando…

Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity

BACKGROUND: A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretatio...

Descripción completa

Detalles Bibliográficos
Autores principales:	Webb, Samuel J, Hanser, Thierry, Howlin, Brendan, Krause, Paul, Vessey, Jonathan D
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3997921/ https://www.ncbi.nlm.nih.gov/pubmed/24661325 http://dx.doi.org/10.1186/1758-2946-6-8

_version_	1782313261008420864
author	Webb, Samuel J Hanser, Thierry Howlin, Brendan Krause, Paul Vessey, Jonathan D
author_facet	Webb, Samuel J Hanser, Thierry Howlin, Brendan Krause, Paul Vessey, Jonathan D
author_sort	Webb, Samuel J
collection	PubMed
description	BACKGROUND: A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints. A fragmentation algorithm is utilised to investigate the model’s behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model’s behaviour for the specific query. RESULTS: Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. CONCLUSION: This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development.
format	Online Article Text
id	pubmed-3997921
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-39979212014-05-08 Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity Webb, Samuel J Hanser, Thierry Howlin, Brendan Krause, Paul Vessey, Jonathan D J Cheminform Research Article BACKGROUND: A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints. A fragmentation algorithm is utilised to investigate the model’s behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model’s behaviour for the specific query. RESULTS: Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. CONCLUSION: This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development. BioMed Central 2014-03-25 /pmc/articles/PMC3997921/ /pubmed/24661325 http://dx.doi.org/10.1186/1758-2946-6-8 Text en Copyright © 2014 Webb et al.; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Webb, Samuel J Hanser, Thierry Howlin, Brendan Krause, Paul Vessey, Jonathan D Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity
title	Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity
title_full	Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity
title_fullStr	Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity
title_full_unstemmed	Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity
title_short	Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity
title_sort	feature combination networks for the interpretation of statistical machine learning models: application to ames mutagenicity
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3997921/ https://www.ncbi.nlm.nih.gov/pubmed/24661325 http://dx.doi.org/10.1186/1758-2946-6-8
work_keys_str_mv	AT webbsamuelj featurecombinationnetworksfortheinterpretationofstatisticalmachinelearningmodelsapplicationtoamesmutagenicity AT hanserthierry featurecombinationnetworksfortheinterpretationofstatisticalmachinelearningmodelsapplicationtoamesmutagenicity AT howlinbrendan featurecombinationnetworksfortheinterpretationofstatisticalmachinelearningmodelsapplicationtoamesmutagenicity AT krausepaul featurecombinationnetworksfortheinterpretationofstatisticalmachinelearningmodelsapplicationtoamesmutagenicity AT vesseyjonathand featurecombinationnetworksfortheinterpretationofstatisticalmachinelearningmodelsapplicationtoamesmutagenicity

Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity

Ejemplares similares