Cargando…
Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity
BACKGROUND: A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretatio...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3997921/ https://www.ncbi.nlm.nih.gov/pubmed/24661325 http://dx.doi.org/10.1186/1758-2946-6-8 |
_version_ | 1782313261008420864 |
---|---|
author | Webb, Samuel J Hanser, Thierry Howlin, Brendan Krause, Paul Vessey, Jonathan D |
author_facet | Webb, Samuel J Hanser, Thierry Howlin, Brendan Krause, Paul Vessey, Jonathan D |
author_sort | Webb, Samuel J |
collection | PubMed |
description | BACKGROUND: A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints. A fragmentation algorithm is utilised to investigate the model’s behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model’s behaviour for the specific query. RESULTS: Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. CONCLUSION: This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development. |
format | Online Article Text |
id | pubmed-3997921 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-39979212014-05-08 Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity Webb, Samuel J Hanser, Thierry Howlin, Brendan Krause, Paul Vessey, Jonathan D J Cheminform Research Article BACKGROUND: A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints. A fragmentation algorithm is utilised to investigate the model’s behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model’s behaviour for the specific query. RESULTS: Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. CONCLUSION: This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development. BioMed Central 2014-03-25 /pmc/articles/PMC3997921/ /pubmed/24661325 http://dx.doi.org/10.1186/1758-2946-6-8 Text en Copyright © 2014 Webb et al.; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Webb, Samuel J Hanser, Thierry Howlin, Brendan Krause, Paul Vessey, Jonathan D Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity |
title | Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity |
title_full | Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity |
title_fullStr | Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity |
title_full_unstemmed | Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity |
title_short | Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity |
title_sort | feature combination networks for the interpretation of statistical machine learning models: application to ames mutagenicity |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3997921/ https://www.ncbi.nlm.nih.gov/pubmed/24661325 http://dx.doi.org/10.1186/1758-2946-6-8 |
work_keys_str_mv | AT webbsamuelj featurecombinationnetworksfortheinterpretationofstatisticalmachinelearningmodelsapplicationtoamesmutagenicity AT hanserthierry featurecombinationnetworksfortheinterpretationofstatisticalmachinelearningmodelsapplicationtoamesmutagenicity AT howlinbrendan featurecombinationnetworksfortheinterpretationofstatisticalmachinelearningmodelsapplicationtoamesmutagenicity AT krausepaul featurecombinationnetworksfortheinterpretationofstatisticalmachinelearningmodelsapplicationtoamesmutagenicity AT vesseyjonathand featurecombinationnetworksfortheinterpretationofstatisticalmachinelearningmodelsapplicationtoamesmutagenicity |