Cargando…
Principal component tests: applied to temporal gene expression data
BACKGROUND: Clustering analysis is a common statistical tool for knowledge discovery. It is mainly conducted when a project still is in the exploratory phase without any priori hypotheses. However, the statistical significance testing between the clusters can be meaningful in helping the researchers...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648765/ https://www.ncbi.nlm.nih.gov/pubmed/19208126 http://dx.doi.org/10.1186/1471-2105-10-S1-S26 |
Sumario: | BACKGROUND: Clustering analysis is a common statistical tool for knowledge discovery. It is mainly conducted when a project still is in the exploratory phase without any priori hypotheses. However, the statistical significance testing between the clusters can be meaningful in helping the researchers to assess if the classification results from implementing a clustering algorithm need to be improved, even after the cluster number has been determined by a well-established criterion. This is important when we want to identify highly-specific patterns through classification. RESULTS: We proposed to use a principal component (PC) test, which is an implementation of an exact F statistic for the measures at multiple endpoints based on elliptical distribution theory, to assess the statistical significance between clusters. A challenge in the implementation is the choice of the number (q) of principal components to be considered, which can severely influence the statistical power of the method. We optimized the determination via validation according to a permutation test based on the clustering to be evaluated. The method was applied to a public dataset in classifying genes according to their temporal gene expression profiles. CONCLUSION: The results demonstrated that the PC testing were useful for determining the optimal number of clusters. |
---|