Cargando…

A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model

BACKGROUND: Bioactivity profiling using high-throughput in vitro assays can reduce the cost and time required for toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in high-dimens...

Descripción completa

Detalles Bibliográficos
Autores principales: Judson, Richard, Elloumi, Fathi, Setzer, R Woodrow, Li, Zhen, Shah, Imran
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2409339/
https://www.ncbi.nlm.nih.gov/pubmed/18489778
http://dx.doi.org/10.1186/1471-2105-9-241
_version_ 1782155753146023936
author Judson, Richard
Elloumi, Fathi
Setzer, R Woodrow
Li, Zhen
Shah, Imran
author_facet Judson, Richard
Elloumi, Fathi
Setzer, R Woodrow
Li, Zhen
Shah, Imran
author_sort Judson, Richard
collection PubMed
description BACKGROUND: Bioactivity profiling using high-throughput in vitro assays can reduce the cost and time required for toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in high-dimensional bioactivity space that predict tissue, organ or whole animal toxicological endpoints. Supervised machine learning is a powerful approach to discover combinatorial relationships in complex in vitro/in vivo datasets. We present a novel model to simulate complex chemical-toxicology data sets and use this model to evaluate the relative performance of different machine learning (ML) methods. RESULTS: The classification performance of Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Naïve Bayes (NB), Recursive Partitioning and Regression Trees (RPART), and Support Vector Machines (SVM) in the presence and absence of filter-based feature selection was analyzed using K-way cross-validation testing and independent validation on simulated in vitro assay data sets with varying levels of model complexity, number of irrelevant features and measurement noise. While the prediction accuracy of all ML methods decreased as non-causal (irrelevant) features were added, some ML methods performed better than others. In the limit of using a large number of features, ANN and SVM were always in the top performing set of methods while RPART and KNN (k = 5) were always in the poorest performing set. The addition of measurement noise and irrelevant features decreased the classification accuracy of all ML methods, with LDA suffering the greatest performance degradation. LDA performance is especially sensitive to the use of feature selection. Filter-based feature selection generally improved performance, most strikingly for LDA. CONCLUSION: We have developed a novel simulation model to evaluate machine learning methods for the analysis of data sets in which in vitro bioassay data is being used to predict in vivo chemical toxicology. From our analysis, we can recommend that several ML methods, most notably SVM and ANN, are good candidates for use in real world applications in this area.
format Text
id pubmed-2409339
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-24093392008-06-04 A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model Judson, Richard Elloumi, Fathi Setzer, R Woodrow Li, Zhen Shah, Imran BMC Bioinformatics Research Article BACKGROUND: Bioactivity profiling using high-throughput in vitro assays can reduce the cost and time required for toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in high-dimensional bioactivity space that predict tissue, organ or whole animal toxicological endpoints. Supervised machine learning is a powerful approach to discover combinatorial relationships in complex in vitro/in vivo datasets. We present a novel model to simulate complex chemical-toxicology data sets and use this model to evaluate the relative performance of different machine learning (ML) methods. RESULTS: The classification performance of Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Naïve Bayes (NB), Recursive Partitioning and Regression Trees (RPART), and Support Vector Machines (SVM) in the presence and absence of filter-based feature selection was analyzed using K-way cross-validation testing and independent validation on simulated in vitro assay data sets with varying levels of model complexity, number of irrelevant features and measurement noise. While the prediction accuracy of all ML methods decreased as non-causal (irrelevant) features were added, some ML methods performed better than others. In the limit of using a large number of features, ANN and SVM were always in the top performing set of methods while RPART and KNN (k = 5) were always in the poorest performing set. The addition of measurement noise and irrelevant features decreased the classification accuracy of all ML methods, with LDA suffering the greatest performance degradation. LDA performance is especially sensitive to the use of feature selection. Filter-based feature selection generally improved performance, most strikingly for LDA. CONCLUSION: We have developed a novel simulation model to evaluate machine learning methods for the analysis of data sets in which in vitro bioassay data is being used to predict in vivo chemical toxicology. From our analysis, we can recommend that several ML methods, most notably SVM and ANN, are good candidates for use in real world applications in this area. BioMed Central 2008-05-19 /pmc/articles/PMC2409339/ /pubmed/18489778 http://dx.doi.org/10.1186/1471-2105-9-241 Text en Copyright © 2008 Judson et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Judson, Richard
Elloumi, Fathi
Setzer, R Woodrow
Li, Zhen
Shah, Imran
A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model
title A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model
title_full A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model
title_fullStr A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model
title_full_unstemmed A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model
title_short A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model
title_sort comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2409339/
https://www.ncbi.nlm.nih.gov/pubmed/18489778
http://dx.doi.org/10.1186/1471-2105-9-241
work_keys_str_mv AT judsonrichard acomparisonofmachinelearningalgorithmsforchemicaltoxicityclassificationusingasimulatedmultiscaledatamodel
AT elloumifathi acomparisonofmachinelearningalgorithmsforchemicaltoxicityclassificationusingasimulatedmultiscaledatamodel
AT setzerrwoodrow acomparisonofmachinelearningalgorithmsforchemicaltoxicityclassificationusingasimulatedmultiscaledatamodel
AT lizhen acomparisonofmachinelearningalgorithmsforchemicaltoxicityclassificationusingasimulatedmultiscaledatamodel
AT shahimran acomparisonofmachinelearningalgorithmsforchemicaltoxicityclassificationusingasimulatedmultiscaledatamodel
AT judsonrichard comparisonofmachinelearningalgorithmsforchemicaltoxicityclassificationusingasimulatedmultiscaledatamodel
AT elloumifathi comparisonofmachinelearningalgorithmsforchemicaltoxicityclassificationusingasimulatedmultiscaledatamodel
AT setzerrwoodrow comparisonofmachinelearningalgorithmsforchemicaltoxicityclassificationusingasimulatedmultiscaledatamodel
AT lizhen comparisonofmachinelearningalgorithmsforchemicaltoxicityclassificationusingasimulatedmultiscaledatamodel
AT shahimran comparisonofmachinelearningalgorithmsforchemicaltoxicityclassificationusingasimulatedmultiscaledatamodel