Cargando…

Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma

In this study, we introduce and use Efficiency Analysis to compare differences in the apparent internal and external consistency of competing normalization methods and tests for identifying differentially expressed genes. Using publicly available data, two lung adenocarcinoma datasets were analyzed...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jordan, Rick, Patel, Satish, Hu, Hai, Lyons-Weiler, James
Formato:	Texto
Lenguaje:	English
Publicado:	Libertas Academica 2008
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2623303/ https://www.ncbi.nlm.nih.gov/pubmed/19259419

_version_	1782163424147406848
author	Jordan, Rick Patel, Satish Hu, Hai Lyons-Weiler, James
author_facet	Jordan, Rick Patel, Satish Hu, Hai Lyons-Weiler, James
author_sort	Jordan, Rick
collection	PubMed
description	In this study, we introduce and use Efficiency Analysis to compare differences in the apparent internal and external consistency of competing normalization methods and tests for identifying differentially expressed genes. Using publicly available data, two lung adenocarcinoma datasets were analyzed using caGEDA (http://bioinformatics2.pitt.edu/GE2/GEDA.html) to measure the degree of differential expression of genes existing between two populations. The datasets were randomly split into at least two subsets, each analyzed for differentially expressed genes between the two sample groups, and the gene lists compared for overlapping genes. Efficiency Analysis is an intuitive method that compares the differences in the percentage of overlap of genes from two or more data subsets, found by the same test over a range of testing methods. Tests that yield consistent gene lists across independently analyzed splits are preferred to those that yield less consistent inferences. For example, a method that exhibits 50% overlap in the 100 top genes from two studies should be preferred to a method that exhibits 5% overlap in the top 100 genes. The same procedure was performed using all available normalization and transformation methods that are available through caGEDA. The ‘best’ test was then further evaluated using internal cross-validation to estimate generalizable sample classification errors using a Naïve Bayes classification algorithm. A novel test, termed D1 (a derivative of the J5 test) was found to be the most consistent, and to exhibit the lowest overall classification error, and highest sensitivity and specificity. The D1 test relaxes the assumption that few genes are differentially expressed. Efficiency Analysis can be misleading if the tests exhibit a bias in any particular dimension (e.g. expression intensity); we therefore explored intensity-scaled and segmented J5 tests using data in which all genes are scaled to share the same intensity distribution range. Efficiency Analysis correctly predicted the ‘best’ test and normalization method using the Beer dataset and also performed well with the Bhattacharjee dataset based on both efficiency and classification accuracy criteria.
format	Text
id	pubmed-2623303
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	Libertas Academica
record_format	MEDLINE/PubMed
spelling	pubmed-26233032009-02-24 Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma Jordan, Rick Patel, Satish Hu, Hai Lyons-Weiler, James Cancer Inform Original Article In this study, we introduce and use Efficiency Analysis to compare differences in the apparent internal and external consistency of competing normalization methods and tests for identifying differentially expressed genes. Using publicly available data, two lung adenocarcinoma datasets were analyzed using caGEDA (http://bioinformatics2.pitt.edu/GE2/GEDA.html) to measure the degree of differential expression of genes existing between two populations. The datasets were randomly split into at least two subsets, each analyzed for differentially expressed genes between the two sample groups, and the gene lists compared for overlapping genes. Efficiency Analysis is an intuitive method that compares the differences in the percentage of overlap of genes from two or more data subsets, found by the same test over a range of testing methods. Tests that yield consistent gene lists across independently analyzed splits are preferred to those that yield less consistent inferences. For example, a method that exhibits 50% overlap in the 100 top genes from two studies should be preferred to a method that exhibits 5% overlap in the top 100 genes. The same procedure was performed using all available normalization and transformation methods that are available through caGEDA. The ‘best’ test was then further evaluated using internal cross-validation to estimate generalizable sample classification errors using a Naïve Bayes classification algorithm. A novel test, termed D1 (a derivative of the J5 test) was found to be the most consistent, and to exhibit the lowest overall classification error, and highest sensitivity and specificity. The D1 test relaxes the assumption that few genes are differentially expressed. Efficiency Analysis can be misleading if the tests exhibit a bias in any particular dimension (e.g. expression intensity); we therefore explored intensity-scaled and segmented J5 tests using data in which all genes are scaled to share the same intensity distribution range. Efficiency Analysis correctly predicted the ‘best’ test and normalization method using the Beer dataset and also performed well with the Bhattacharjee dataset based on both efficiency and classification accuracy criteria. Libertas Academica 2008-07-14 /pmc/articles/PMC2623303/ /pubmed/19259419 Text en © 2008 by the authors http://creativecommons.org/licenses/by/3.0 This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
spellingShingle	Original Article Jordan, Rick Patel, Satish Hu, Hai Lyons-Weiler, James Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma
title	Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma
title_full	Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma
title_fullStr	Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma
title_full_unstemmed	Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma
title_short	Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma
title_sort	efficiency analysis of competing tests for finding differentially expressed genes in lung adenocarcinoma
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2623303/ https://www.ncbi.nlm.nih.gov/pubmed/19259419
work_keys_str_mv	AT jordanrick efficiencyanalysisofcompetingtestsforfindingdifferentiallyexpressedgenesinlungadenocarcinoma AT patelsatish efficiencyanalysisofcompetingtestsforfindingdifferentiallyexpressedgenesinlungadenocarcinoma AT huhai efficiencyanalysisofcompetingtestsforfindingdifferentiallyexpressedgenesinlungadenocarcinoma AT lyonsweilerjames efficiencyanalysisofcompetingtestsforfindingdifferentiallyexpressedgenesinlungadenocarcinoma

Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma

Ejemplares similares