Cargando…

Discovering multi–level structures in bio-molecular data through the Bernstein inequality

BACKGROUND: The unsupervised discovery of structures (i.e. clusterings) underlying data is a central issue in several branches of bioinformatics. Methods based on the concept of stability have been recently proposed to assess the reliability of a clustering procedure and to estimate the “optimal” nu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bertoni, Alberto, Valentini, Giorgio
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2323667/ https://www.ncbi.nlm.nih.gov/pubmed/18387206 http://dx.doi.org/10.1186/1471-2105-9-S2-S4

_version_	1782152675146596352
author	Bertoni, Alberto Valentini, Giorgio
author_facet	Bertoni, Alberto Valentini, Giorgio
author_sort	Bertoni, Alberto
collection	PubMed
description	BACKGROUND: The unsupervised discovery of structures (i.e. clusterings) underlying data is a central issue in several branches of bioinformatics. Methods based on the concept of stability have been recently proposed to assess the reliability of a clustering procedure and to estimate the “optimal” number of clusters in bio-molecular data. A major problem with stability-based methods is the detection of multi-level structures (e.g. hierarchical functional classes of genes), and the assessment of their statistical significance. In this context, a chi-square based statistical test of hypothesis has been proposed; however, to assure the correctness of this technique some assumptions about the distribution of the data are needed. RESULTS: To assess the statistical significance and to discover multi-level structures in bio-molecular data, a new method based on Bernstein's inequality is proposed. This approach makes no assumptions about the distribution of the data, thus assuring a reliable application to a large range of bioinformatics problems. Results with synthetic and DNA microarray data show the effectiveness of the proposed method. CONCLUSIONS: The Bernstein test, due to its loose assumptions, is more sensitive than the chi-square test to the detection of multiple structures simultaneously present in the data. Nevertheless it is less selective, that is subject to more false positives, but adding independence assumptions, a more selective variant of the Bernstein inequality-based test is also presented. The proposed methods can be applied to discover multiple structures and to assess their significance in different types of bio-molecular data.
format	Text
id	pubmed-2323667
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-23236672008-04-22 Discovering multi–level structures in bio-molecular data through the Bernstein inequality Bertoni, Alberto Valentini, Giorgio BMC Bioinformatics Research BACKGROUND: The unsupervised discovery of structures (i.e. clusterings) underlying data is a central issue in several branches of bioinformatics. Methods based on the concept of stability have been recently proposed to assess the reliability of a clustering procedure and to estimate the “optimal” number of clusters in bio-molecular data. A major problem with stability-based methods is the detection of multi-level structures (e.g. hierarchical functional classes of genes), and the assessment of their statistical significance. In this context, a chi-square based statistical test of hypothesis has been proposed; however, to assure the correctness of this technique some assumptions about the distribution of the data are needed. RESULTS: To assess the statistical significance and to discover multi-level structures in bio-molecular data, a new method based on Bernstein's inequality is proposed. This approach makes no assumptions about the distribution of the data, thus assuring a reliable application to a large range of bioinformatics problems. Results with synthetic and DNA microarray data show the effectiveness of the proposed method. CONCLUSIONS: The Bernstein test, due to its loose assumptions, is more sensitive than the chi-square test to the detection of multiple structures simultaneously present in the data. Nevertheless it is less selective, that is subject to more false positives, but adding independence assumptions, a more selective variant of the Bernstein inequality-based test is also presented. The proposed methods can be applied to discover multiple structures and to assess their significance in different types of bio-molecular data. BioMed Central 2008-03-26 /pmc/articles/PMC2323667/ /pubmed/18387206 http://dx.doi.org/10.1186/1471-2105-9-S2-S4 Text en Copyright © 2008 Bertoni and Valentini; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Bertoni, Alberto Valentini, Giorgio Discovering multi–level structures in bio-molecular data through the Bernstein inequality
title	Discovering multi–level structures in bio-molecular data through the Bernstein inequality
title_full	Discovering multi–level structures in bio-molecular data through the Bernstein inequality
title_fullStr	Discovering multi–level structures in bio-molecular data through the Bernstein inequality
title_full_unstemmed	Discovering multi–level structures in bio-molecular data through the Bernstein inequality
title_short	Discovering multi–level structures in bio-molecular data through the Bernstein inequality
title_sort	discovering multi–level structures in bio-molecular data through the bernstein inequality
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2323667/ https://www.ncbi.nlm.nih.gov/pubmed/18387206 http://dx.doi.org/10.1186/1471-2105-9-S2-S4
work_keys_str_mv	AT bertonialberto discoveringmultilevelstructuresinbiomoleculardatathroughthebernsteininequality AT valentinigiorgio discoveringmultilevelstructuresinbiomoleculardatathroughthebernsteininequality

Discovering multi–level structures in bio-molecular data through the Bernstein inequality

Ejemplares similares