Cargando…

Identification and Optimization of Classifier Genes from Multi-Class Earthworm Microarray Dataset

Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. A variety of toxicological effects have been associated with explosive compounds TNT and RDX. One important goal of microarray experiments is to discover novel biomarkers for...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Ying, Wang, Nan, Perkins, Edward J., Zhang, Chaoyang, Gong, Ping
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2965664/
https://www.ncbi.nlm.nih.gov/pubmed/21060837
http://dx.doi.org/10.1371/journal.pone.0013715
_version_ 1782189521157226496
author Li, Ying
Wang, Nan
Perkins, Edward J.
Zhang, Chaoyang
Gong, Ping
author_facet Li, Ying
Wang, Nan
Perkins, Edward J.
Zhang, Chaoyang
Gong, Ping
author_sort Li, Ying
collection PubMed
description Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. A variety of toxicological effects have been associated with explosive compounds TNT and RDX. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. We have developed an earthworm microarray containing 15,208 unique oligo probes and have used it to profile gene expression in 248 earthworms exposed to TNT, RDX or neither. We assembled a new machine learning pipeline consisting of several well-established feature filtering/selection and classification techniques to analyze the 248-array dataset in order to construct classifier models that can separate earthworm samples into three groups: control, TNT-treated, and RDX-treated. First, a total of 869 genes differentially expressed in response to TNT or RDX exposure were identified using a univariate statistical algorithm of class comparison. Then, decision tree-based algorithms were applied to select a subset of 354 classifier genes, which were ranked by their overall weight of significance. A multiclass support vector machine (MC-SVM) method and an unsupervised K-mean clustering method were applied to independently refine the classifier, producing a smaller subset of 39 and 30 classifier genes, separately, with 11 common genes being potential biomarkers. The combined 58 genes were considered the refined subset and used to build MC-SVM and clustering models with classification accuracy of 83.5% and 56.9%, respectively. This study demonstrates that the machine learning approach can be used to identify and optimize a small subset of classifier/biomarker genes from high dimensional datasets and generate classification models of acceptable precision for multiple classes.
format Text
id pubmed-2965664
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-29656642010-11-08 Identification and Optimization of Classifier Genes from Multi-Class Earthworm Microarray Dataset Li, Ying Wang, Nan Perkins, Edward J. Zhang, Chaoyang Gong, Ping PLoS One Research Article Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. A variety of toxicological effects have been associated with explosive compounds TNT and RDX. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. We have developed an earthworm microarray containing 15,208 unique oligo probes and have used it to profile gene expression in 248 earthworms exposed to TNT, RDX or neither. We assembled a new machine learning pipeline consisting of several well-established feature filtering/selection and classification techniques to analyze the 248-array dataset in order to construct classifier models that can separate earthworm samples into three groups: control, TNT-treated, and RDX-treated. First, a total of 869 genes differentially expressed in response to TNT or RDX exposure were identified using a univariate statistical algorithm of class comparison. Then, decision tree-based algorithms were applied to select a subset of 354 classifier genes, which were ranked by their overall weight of significance. A multiclass support vector machine (MC-SVM) method and an unsupervised K-mean clustering method were applied to independently refine the classifier, producing a smaller subset of 39 and 30 classifier genes, separately, with 11 common genes being potential biomarkers. The combined 58 genes were considered the refined subset and used to build MC-SVM and clustering models with classification accuracy of 83.5% and 56.9%, respectively. This study demonstrates that the machine learning approach can be used to identify and optimize a small subset of classifier/biomarker genes from high dimensional datasets and generate classification models of acceptable precision for multiple classes. Public Library of Science 2010-10-28 /pmc/articles/PMC2965664/ /pubmed/21060837 http://dx.doi.org/10.1371/journal.pone.0013715 Text en This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Research Article
Li, Ying
Wang, Nan
Perkins, Edward J.
Zhang, Chaoyang
Gong, Ping
Identification and Optimization of Classifier Genes from Multi-Class Earthworm Microarray Dataset
title Identification and Optimization of Classifier Genes from Multi-Class Earthworm Microarray Dataset
title_full Identification and Optimization of Classifier Genes from Multi-Class Earthworm Microarray Dataset
title_fullStr Identification and Optimization of Classifier Genes from Multi-Class Earthworm Microarray Dataset
title_full_unstemmed Identification and Optimization of Classifier Genes from Multi-Class Earthworm Microarray Dataset
title_short Identification and Optimization of Classifier Genes from Multi-Class Earthworm Microarray Dataset
title_sort identification and optimization of classifier genes from multi-class earthworm microarray dataset
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2965664/
https://www.ncbi.nlm.nih.gov/pubmed/21060837
http://dx.doi.org/10.1371/journal.pone.0013715
work_keys_str_mv AT liying identificationandoptimizationofclassifiergenesfrommulticlassearthwormmicroarraydataset
AT wangnan identificationandoptimizationofclassifiergenesfrommulticlassearthwormmicroarraydataset
AT perkinsedwardj identificationandoptimizationofclassifiergenesfrommulticlassearthwormmicroarraydataset
AT zhangchaoyang identificationandoptimizationofclassifiergenesfrommulticlassearthwormmicroarraydataset
AT gongping identificationandoptimizationofclassifiergenesfrommulticlassearthwormmicroarraydataset