Cargando…
Utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in GeneChip studies
BACKGROUND: The choice of probe set algorithms for expression summary in a GeneChip study has a great impact on subsequent gene expression data analysis. Spiked-in cRNAs with known concentration are often used to assess the relative performance of probe set algorithms. Given the fact that the spiked...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1361777/ https://www.ncbi.nlm.nih.gov/pubmed/16403228 http://dx.doi.org/10.1186/1471-2105-7-12 |
_version_ | 1782126723006988288 |
---|---|
author | Hu, Zihua Willsky, Gail R |
author_facet | Hu, Zihua Willsky, Gail R |
author_sort | Hu, Zihua |
collection | PubMed |
description | BACKGROUND: The choice of probe set algorithms for expression summary in a GeneChip study has a great impact on subsequent gene expression data analysis. Spiked-in cRNAs with known concentration are often used to assess the relative performance of probe set algorithms. Given the fact that the spiked-in cRNAs do not represent endogenously expressed genes in experiments, it becomes increasingly important to have methods to study whether a particular probe set algorithm is more appropriate for a specific dataset, without using such external reference data. RESULTS: We propose the use of the probe set redundancy feature for evaluating the performance of probe set algorithms, and have presented three approaches for analyzing data variance and result bias using two sample t-test statistics from redundant probe sets. These approaches are as follows: 1) analyzing redundant probe set variance based on t-statistic rank order, 2) computing correlation of t-statistics between redundant probe sets, and 3) analyzing the co-occurrence of replicate redundant probe sets representing differentially expressed genes. We applied these approaches to expression summary data generated from three datasets utilizing individual probe set algorithms of MAS5.0, dChip, or RMA. We also utilized combinations of options from the three probe set algorithms. We found that results from the three approaches were similar within each individual expression summary dataset, and were also in good agreement with previously reported findings by others. We also demonstrate the validity of our findings by independent experimental methods. CONCLUSION: All three proposed approaches allowed us to assess the performance of probe set algorithms using the probe set redundancy feature. The analyses of redundant probe set variance based on t-statistic rank order and correlation of t-statistics between redundant probe sets provide useful tools for data variance analysis, and the co-occurrence of replicate redundant probe sets representing differentially expressed genes allows estimation of result bias. The results also suggest that individual probe set algorithms have dataset-specific performance. |
format | Text |
id | pubmed-1361777 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-13617772006-02-10 Utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in GeneChip studies Hu, Zihua Willsky, Gail R BMC Bioinformatics Methodology Article BACKGROUND: The choice of probe set algorithms for expression summary in a GeneChip study has a great impact on subsequent gene expression data analysis. Spiked-in cRNAs with known concentration are often used to assess the relative performance of probe set algorithms. Given the fact that the spiked-in cRNAs do not represent endogenously expressed genes in experiments, it becomes increasingly important to have methods to study whether a particular probe set algorithm is more appropriate for a specific dataset, without using such external reference data. RESULTS: We propose the use of the probe set redundancy feature for evaluating the performance of probe set algorithms, and have presented three approaches for analyzing data variance and result bias using two sample t-test statistics from redundant probe sets. These approaches are as follows: 1) analyzing redundant probe set variance based on t-statistic rank order, 2) computing correlation of t-statistics between redundant probe sets, and 3) analyzing the co-occurrence of replicate redundant probe sets representing differentially expressed genes. We applied these approaches to expression summary data generated from three datasets utilizing individual probe set algorithms of MAS5.0, dChip, or RMA. We also utilized combinations of options from the three probe set algorithms. We found that results from the three approaches were similar within each individual expression summary dataset, and were also in good agreement with previously reported findings by others. We also demonstrate the validity of our findings by independent experimental methods. CONCLUSION: All three proposed approaches allowed us to assess the performance of probe set algorithms using the probe set redundancy feature. The analyses of redundant probe set variance based on t-statistic rank order and correlation of t-statistics between redundant probe sets provide useful tools for data variance analysis, and the co-occurrence of replicate redundant probe sets representing differentially expressed genes allows estimation of result bias. The results also suggest that individual probe set algorithms have dataset-specific performance. BioMed Central 2006-01-10 /pmc/articles/PMC1361777/ /pubmed/16403228 http://dx.doi.org/10.1186/1471-2105-7-12 Text en Copyright © 2006 Hu and Willsky; licensee BioMed Central Ltd. |
spellingShingle | Methodology Article Hu, Zihua Willsky, Gail R Utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in GeneChip studies |
title | Utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in GeneChip studies |
title_full | Utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in GeneChip studies |
title_fullStr | Utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in GeneChip studies |
title_full_unstemmed | Utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in GeneChip studies |
title_short | Utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in GeneChip studies |
title_sort | utilization of two sample t-test statistics from redundant probe sets to evaluate different probe set algorithms in genechip studies |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1361777/ https://www.ncbi.nlm.nih.gov/pubmed/16403228 http://dx.doi.org/10.1186/1471-2105-7-12 |
work_keys_str_mv | AT huzihua utilizationoftwosampletteststatisticsfromredundantprobesetstoevaluatedifferentprobesetalgorithmsingenechipstudies AT willskygailr utilizationoftwosampletteststatisticsfromredundantprobesetstoevaluatedifferentprobesetalgorithmsingenechipstudies |