Cargando…

Three-parameter lognormal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment

BACKGROUND: To cancel experimental variations, microarray data must be normalized prior to analysis. Where an appropriate model for statistical data distribution is available, a parametric method can normalize a group of data sets that have common distributions. Although such models have been propos...

Descripción completa

Detalles Bibliográficos
Autor principal:	Konishi, Tomokazu
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2004
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC333424/ https://www.ncbi.nlm.nih.gov/pubmed/14718068 http://dx.doi.org/10.1186/1471-2105-5-5

_version_	1782121211035123712
author	Konishi, Tomokazu
author_facet	Konishi, Tomokazu
author_sort	Konishi, Tomokazu
collection	PubMed
description	BACKGROUND: To cancel experimental variations, microarray data must be normalized prior to analysis. Where an appropriate model for statistical data distribution is available, a parametric method can normalize a group of data sets that have common distributions. Although such models have been proposed for microarray data, they have not always fit the distribution of real data and thus have been inappropriate for normalization. Consequently, microarray data in most cases have been normalized with non-parametric methods that adjust data in a pair-wise manner. However, data analysis and the integration of resultant knowledge among experiments have been difficult, since such normalization concepts lack a universal standard. RESULTS: A three-parameter lognormal distribution model was tested on over 300 sets of microarray data. The model treats the hybridization background, which is difficult to identify from images of hybridization, as one of the parameters. A rigorous coincidence of the model to data sets was found, proving the model's appropriateness for microarray data. In fact, a closer fitting to Northern analysis was obtained. The model showed inconsistency only at very strong or weak data intensities. Measurement of z-scores as well as calculated ratios was reproducible only among data in the model-consistent intensity range; also, the ratios were independent of signal intensity at the corresponding range. CONCLUSION: The model could provide a universal standard for data, simplifying data analysis and knowledge integration. It was deduced that the ranges of inconsistency were caused by experimental errors or additive noise in the data; therefore, excluding the data corresponding to those marginal ranges will prevent misleading analytical conclusions.
format	Text
id	pubmed-333424
institution	National Center for Biotechnology Information
language	English
publishDate	2004
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-3334242004-02-08 Three-parameter lognormal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment Konishi, Tomokazu BMC Bioinformatics Methodology Article BACKGROUND: To cancel experimental variations, microarray data must be normalized prior to analysis. Where an appropriate model for statistical data distribution is available, a parametric method can normalize a group of data sets that have common distributions. Although such models have been proposed for microarray data, they have not always fit the distribution of real data and thus have been inappropriate for normalization. Consequently, microarray data in most cases have been normalized with non-parametric methods that adjust data in a pair-wise manner. However, data analysis and the integration of resultant knowledge among experiments have been difficult, since such normalization concepts lack a universal standard. RESULTS: A three-parameter lognormal distribution model was tested on over 300 sets of microarray data. The model treats the hybridization background, which is difficult to identify from images of hybridization, as one of the parameters. A rigorous coincidence of the model to data sets was found, proving the model's appropriateness for microarray data. In fact, a closer fitting to Northern analysis was obtained. The model showed inconsistency only at very strong or weak data intensities. Measurement of z-scores as well as calculated ratios was reproducible only among data in the model-consistent intensity range; also, the ratios were independent of signal intensity at the corresponding range. CONCLUSION: The model could provide a universal standard for data, simplifying data analysis and knowledge integration. It was deduced that the ranges of inconsistency were caused by experimental errors or additive noise in the data; therefore, excluding the data corresponding to those marginal ranges will prevent misleading analytical conclusions. BioMed Central 2004-01-13 /pmc/articles/PMC333424/ /pubmed/14718068 http://dx.doi.org/10.1186/1471-2105-5-5 Text en Copyright © 2004 Konishi; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle	Methodology Article Konishi, Tomokazu Three-parameter lognormal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment
title	Three-parameter lognormal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment
title_full	Three-parameter lognormal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment
title_fullStr	Three-parameter lognormal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment
title_full_unstemmed	Three-parameter lognormal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment
title_short	Three-parameter lognormal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment
title_sort	three-parameter lognormal distribution ubiquitously found in cdna microarray data and its application to parametric data treatment
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC333424/ https://www.ncbi.nlm.nih.gov/pubmed/14718068 http://dx.doi.org/10.1186/1471-2105-5-5
work_keys_str_mv	AT konishitomokazu threeparameterlognormaldistributionubiquitouslyfoundincdnamicroarraydataanditsapplicationtoparametricdatatreatment

Three-parameter lognormal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment

Ejemplares similares