Cargando…
Unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided method
BACKGROUND: Large biological data sets, such as expression profiles, benefit from reduction of random noise. Principal component (PC) analysis has been used for this purpose, but it tends to remove small features as well as random noise. RESULTS: We interpreted the PCs as a mere signal-rich coordina...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2607290/ https://www.ncbi.nlm.nih.gov/pubmed/19040754 http://dx.doi.org/10.1186/1471-2105-9-508 |
_version_ | 1782163045662851072 |
---|---|
author | Foley, Joseph W Katagiri, Fumiaki |
author_facet | Foley, Joseph W Katagiri, Fumiaki |
author_sort | Foley, Joseph W |
collection | PubMed |
description | BACKGROUND: Large biological data sets, such as expression profiles, benefit from reduction of random noise. Principal component (PC) analysis has been used for this purpose, but it tends to remove small features as well as random noise. RESULTS: We interpreted the PCs as a mere signal-rich coordinate system and sorted the squared PC-coordinates of each row in descending order. The sorted squared PC-coordinates were compared with the distribution of the ordered squared random noise, and PC-coordinates for insignificant contributions were treated as random noise and nullified. The processed data were transformed back to the initial coordinates as noise-reduced data. To increase the sensitivity of signal capture and reduce the effects of stochastic noise, this procedure was applied to multiple small subsets of rows randomly sampled from a large data set, and the results corresponding to each row of the data set from multiple subsets were averaged. We call this procedure Row-specific, Sorted PRincipal component-guided Noise Reduction (RSPR-NR). Robust performance of RSPR-NR, measured by noise reduction and retention of small features, was demonstrated using simulated data sets. Furthermore, when applied to an actual expression profile data set, RSPR-NR preferentially increased the correlations between genes that share the same Gene Ontology terms, strongly suggesting reduction of random noise in the data set. CONCLUSION: RSPR-NR is a robust random noise reduction method that retains small features well. It should be useful in improving the quality of large biological data sets. |
format | Text |
id | pubmed-2607290 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-26072902008-12-24 Unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided method Foley, Joseph W Katagiri, Fumiaki BMC Bioinformatics Methodology Article BACKGROUND: Large biological data sets, such as expression profiles, benefit from reduction of random noise. Principal component (PC) analysis has been used for this purpose, but it tends to remove small features as well as random noise. RESULTS: We interpreted the PCs as a mere signal-rich coordinate system and sorted the squared PC-coordinates of each row in descending order. The sorted squared PC-coordinates were compared with the distribution of the ordered squared random noise, and PC-coordinates for insignificant contributions were treated as random noise and nullified. The processed data were transformed back to the initial coordinates as noise-reduced data. To increase the sensitivity of signal capture and reduce the effects of stochastic noise, this procedure was applied to multiple small subsets of rows randomly sampled from a large data set, and the results corresponding to each row of the data set from multiple subsets were averaged. We call this procedure Row-specific, Sorted PRincipal component-guided Noise Reduction (RSPR-NR). Robust performance of RSPR-NR, measured by noise reduction and retention of small features, was demonstrated using simulated data sets. Furthermore, when applied to an actual expression profile data set, RSPR-NR preferentially increased the correlations between genes that share the same Gene Ontology terms, strongly suggesting reduction of random noise in the data set. CONCLUSION: RSPR-NR is a robust random noise reduction method that retains small features well. It should be useful in improving the quality of large biological data sets. BioMed Central 2008-11-29 /pmc/articles/PMC2607290/ /pubmed/19040754 http://dx.doi.org/10.1186/1471-2105-9-508 Text en Copyright © 2008 Foley and Katagiri; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Foley, Joseph W Katagiri, Fumiaki Unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided method |
title | Unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided method |
title_full | Unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided method |
title_fullStr | Unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided method |
title_full_unstemmed | Unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided method |
title_short | Unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided method |
title_sort | unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided method |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2607290/ https://www.ncbi.nlm.nih.gov/pubmed/19040754 http://dx.doi.org/10.1186/1471-2105-9-508 |
work_keys_str_mv | AT foleyjosephw unsupervisedreductionofrandomnoiseincomplexdatabyarowspecificsortedprincipalcomponentguidedmethod AT katagirifumiaki unsupervisedreductionofrandomnoiseincomplexdatabyarowspecificsortedprincipalcomponentguidedmethod |