Cargando…

Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery

Toxicogenomics promises to aid in predicting adverse effects, understanding the mechanisms of drug action or toxicity, and uncovering unexpected or secondary pharmacology. However, modeling adverse effects using high dimensional and high noise genomic data is prone to over-fitting. Models constructe...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Jiangang, Jolly, Robert A., Smith, Aaron T., Searfoss, George H., Goldstein, Keith M., Uversky, Vladimir N., Dunker, Keith, Li, Shuyu, Thomas, Craig E., Wei, Tao
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3174148/ https://www.ncbi.nlm.nih.gov/pubmed/21935387 http://dx.doi.org/10.1371/journal.pone.0024233

_version_	1782212034482405376
author	Liu, Jiangang Jolly, Robert A. Smith, Aaron T. Searfoss, George H. Goldstein, Keith M. Uversky, Vladimir N. Dunker, Keith Li, Shuyu Thomas, Craig E. Wei, Tao
author_facet	Liu, Jiangang Jolly, Robert A. Smith, Aaron T. Searfoss, George H. Goldstein, Keith M. Uversky, Vladimir N. Dunker, Keith Li, Shuyu Thomas, Craig E. Wei, Tao
author_sort	Liu, Jiangang
collection	PubMed
description	Toxicogenomics promises to aid in predicting adverse effects, understanding the mechanisms of drug action or toxicity, and uncovering unexpected or secondary pharmacology. However, modeling adverse effects using high dimensional and high noise genomic data is prone to over-fitting. Models constructed from such data sets often consist of a large number of genes with no obvious functional relevance to the biological effect the model intends to predict that can make it challenging to interpret the modeling results. To address these issues, we developed a novel algorithm, Predictive Power Estimation Algorithm (PPEA), which estimates the predictive power of each individual transcript through an iterative two-way bootstrapping procedure. By repeatedly enforcing that the sample number is larger than the transcript number, in each iteration of modeling and testing, PPEA reduces the potential risk of overfitting. We show with three different cases studies that: (1) PPEA can quickly derive a reliable rank order of predictive power of individual transcripts in a relatively small number of iterations, (2) the top ranked transcripts tend to be functionally related to the phenotype they are intended to predict, (3) using only the most predictive top ranked transcripts greatly facilitates development of multiplex assay such as qRT-PCR as a biomarker, and (4) more importantly, we were able to demonstrate that a small number of genes identified from the top-ranked transcripts are highly predictive of phenotype as their expression changes distinguished adverse from nonadverse effects of compounds in completely independent tests. Thus, we believe that the PPEA model effectively addresses the over-fitting problem and can be used to facilitate genomic biomarker discovery for predictive toxicology and drug responses.
format	Online Article Text
id	pubmed-3174148
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-31741482011-09-20 Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery Liu, Jiangang Jolly, Robert A. Smith, Aaron T. Searfoss, George H. Goldstein, Keith M. Uversky, Vladimir N. Dunker, Keith Li, Shuyu Thomas, Craig E. Wei, Tao PLoS One Research Article Toxicogenomics promises to aid in predicting adverse effects, understanding the mechanisms of drug action or toxicity, and uncovering unexpected or secondary pharmacology. However, modeling adverse effects using high dimensional and high noise genomic data is prone to over-fitting. Models constructed from such data sets often consist of a large number of genes with no obvious functional relevance to the biological effect the model intends to predict that can make it challenging to interpret the modeling results. To address these issues, we developed a novel algorithm, Predictive Power Estimation Algorithm (PPEA), which estimates the predictive power of each individual transcript through an iterative two-way bootstrapping procedure. By repeatedly enforcing that the sample number is larger than the transcript number, in each iteration of modeling and testing, PPEA reduces the potential risk of overfitting. We show with three different cases studies that: (1) PPEA can quickly derive a reliable rank order of predictive power of individual transcripts in a relatively small number of iterations, (2) the top ranked transcripts tend to be functionally related to the phenotype they are intended to predict, (3) using only the most predictive top ranked transcripts greatly facilitates development of multiplex assay such as qRT-PCR as a biomarker, and (4) more importantly, we were able to demonstrate that a small number of genes identified from the top-ranked transcripts are highly predictive of phenotype as their expression changes distinguished adverse from nonadverse effects of compounds in completely independent tests. Thus, we believe that the PPEA model effectively addresses the over-fitting problem and can be used to facilitate genomic biomarker discovery for predictive toxicology and drug responses. Public Library of Science 2011-09-15 /pmc/articles/PMC3174148/ /pubmed/21935387 http://dx.doi.org/10.1371/journal.pone.0024233 Text en Liu et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Liu, Jiangang Jolly, Robert A. Smith, Aaron T. Searfoss, George H. Goldstein, Keith M. Uversky, Vladimir N. Dunker, Keith Li, Shuyu Thomas, Craig E. Wei, Tao Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery
title	Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery
title_full	Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery
title_fullStr	Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery
title_full_unstemmed	Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery
title_short	Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery
title_sort	predictive power estimation algorithm (ppea) - a new algorithm to reduce overfitting for genomic biomarker discovery
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3174148/ https://www.ncbi.nlm.nih.gov/pubmed/21935387 http://dx.doi.org/10.1371/journal.pone.0024233
work_keys_str_mv	AT liujiangang predictivepowerestimationalgorithmppeaanewalgorithmtoreduceoverfittingforgenomicbiomarkerdiscovery AT jollyroberta predictivepowerestimationalgorithmppeaanewalgorithmtoreduceoverfittingforgenomicbiomarkerdiscovery AT smithaaront predictivepowerestimationalgorithmppeaanewalgorithmtoreduceoverfittingforgenomicbiomarkerdiscovery AT searfossgeorgeh predictivepowerestimationalgorithmppeaanewalgorithmtoreduceoverfittingforgenomicbiomarkerdiscovery AT goldsteinkeithm predictivepowerestimationalgorithmppeaanewalgorithmtoreduceoverfittingforgenomicbiomarkerdiscovery AT uverskyvladimirn predictivepowerestimationalgorithmppeaanewalgorithmtoreduceoverfittingforgenomicbiomarkerdiscovery AT dunkerkeith predictivepowerestimationalgorithmppeaanewalgorithmtoreduceoverfittingforgenomicbiomarkerdiscovery AT lishuyu predictivepowerestimationalgorithmppeaanewalgorithmtoreduceoverfittingforgenomicbiomarkerdiscovery AT thomascraige predictivepowerestimationalgorithmppeaanewalgorithmtoreduceoverfittingforgenomicbiomarkerdiscovery AT weitao predictivepowerestimationalgorithmppeaanewalgorithmtoreduceoverfittingforgenomicbiomarkerdiscovery

Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery

Ejemplares similares