Cargando…

Utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in GeneChip studies

BACKGROUND: The choice of probe set algorithms for expression summary in a GeneChip study has a great impact on subsequent gene expression data analysis. Spiked-in cRNAs with known concentration are often used to assess the relative performance of probe set algorithms. Given the fact that the spiked...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Zihua, Willsky, Gail R
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1361777/
https://www.ncbi.nlm.nih.gov/pubmed/16403228
http://dx.doi.org/10.1186/1471-2105-7-12
_version_ 1782126723006988288
author Hu, Zihua
Willsky, Gail R
author_facet Hu, Zihua
Willsky, Gail R
author_sort Hu, Zihua
collection PubMed
description BACKGROUND: The choice of probe set algorithms for expression summary in a GeneChip study has a great impact on subsequent gene expression data analysis. Spiked-in cRNAs with known concentration are often used to assess the relative performance of probe set algorithms. Given the fact that the spiked-in cRNAs do not represent endogenously expressed genes in experiments, it becomes increasingly important to have methods to study whether a particular probe set algorithm is more appropriate for a specific dataset, without using such external reference data. RESULTS: We propose the use of the probe set redundancy feature for evaluating the performance of probe set algorithms, and have presented three approaches for analyzing data variance and result bias using two sample t-test statistics from redundant probe sets. These approaches are as follows: 1) analyzing redundant probe set variance based on t-statistic rank order, 2) computing correlation of t-statistics between redundant probe sets, and 3) analyzing the co-occurrence of replicate redundant probe sets representing differentially expressed genes. We applied these approaches to expression summary data generated from three datasets utilizing individual probe set algorithms of MAS5.0, dChip, or RMA. We also utilized combinations of options from the three probe set algorithms. We found that results from the three approaches were similar within each individual expression summary dataset, and were also in good agreement with previously reported findings by others. We also demonstrate the validity of our findings by independent experimental methods. CONCLUSION: All three proposed approaches allowed us to assess the performance of probe set algorithms using the probe set redundancy feature. The analyses of redundant probe set variance based on t-statistic rank order and correlation of t-statistics between redundant probe sets provide useful tools for data variance analysis, and the co-occurrence of replicate redundant probe sets representing differentially expressed genes allows estimation of result bias. The results also suggest that individual probe set algorithms have dataset-specific performance.
format Text
id pubmed-1361777
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-13617772006-02-10 Utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in GeneChip studies Hu, Zihua Willsky, Gail R BMC Bioinformatics Methodology Article BACKGROUND: The choice of probe set algorithms for expression summary in a GeneChip study has a great impact on subsequent gene expression data analysis. Spiked-in cRNAs with known concentration are often used to assess the relative performance of probe set algorithms. Given the fact that the spiked-in cRNAs do not represent endogenously expressed genes in experiments, it becomes increasingly important to have methods to study whether a particular probe set algorithm is more appropriate for a specific dataset, without using such external reference data. RESULTS: We propose the use of the probe set redundancy feature for evaluating the performance of probe set algorithms, and have presented three approaches for analyzing data variance and result bias using two sample t-test statistics from redundant probe sets. These approaches are as follows: 1) analyzing redundant probe set variance based on t-statistic rank order, 2) computing correlation of t-statistics between redundant probe sets, and 3) analyzing the co-occurrence of replicate redundant probe sets representing differentially expressed genes. We applied these approaches to expression summary data generated from three datasets utilizing individual probe set algorithms of MAS5.0, dChip, or RMA. We also utilized combinations of options from the three probe set algorithms. We found that results from the three approaches were similar within each individual expression summary dataset, and were also in good agreement with previously reported findings by others. We also demonstrate the validity of our findings by independent experimental methods. CONCLUSION: All three proposed approaches allowed us to assess the performance of probe set algorithms using the probe set redundancy feature. The analyses of redundant probe set variance based on t-statistic rank order and correlation of t-statistics between redundant probe sets provide useful tools for data variance analysis, and the co-occurrence of replicate redundant probe sets representing differentially expressed genes allows estimation of result bias. The results also suggest that individual probe set algorithms have dataset-specific performance. BioMed Central 2006-01-10 /pmc/articles/PMC1361777/ /pubmed/16403228 http://dx.doi.org/10.1186/1471-2105-7-12 Text en Copyright © 2006 Hu and Willsky; licensee BioMed Central Ltd.
spellingShingle Methodology Article
Hu, Zihua
Willsky, Gail R
Utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in GeneChip studies
title Utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in GeneChip studies
title_full Utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in GeneChip studies
title_fullStr Utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in GeneChip studies
title_full_unstemmed Utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in GeneChip studies
title_short Utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in GeneChip studies
title_sort utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in genechip studies
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1361777/
https://www.ncbi.nlm.nih.gov/pubmed/16403228
http://dx.doi.org/10.1186/1471-2105-7-12
work_keys_str_mv AT huzihua utilizationoftwosampletteststatisticsfromredundantprobesetstoevaluatedifferentprobesetalgorithmsingenechipstudies
AT willskygailr utilizationoftwosampletteststatisticsfromredundantprobesetstoevaluatedifferentprobesetalgorithmsingenechipstudies