Cargando…

Robust differential expression analysis by learning discriminant boundary in multi-dimensional space of statistical attributes

BACKGROUND: Performing statistical tests is an important step in analyzing genome-wide datasets for detecting genomic features differentially expressed between conditions. Each type of statistical test has its own advantages in characterizing certain aspects of differences between population means a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bei, Yuanzhe, Hong, Pengyu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5168810/ https://www.ncbi.nlm.nih.gov/pubmed/27993137 http://dx.doi.org/10.1186/s12859-016-1386-x

_version_	1782483416601591808
author	Bei, Yuanzhe Hong, Pengyu
author_facet	Bei, Yuanzhe Hong, Pengyu
author_sort	Bei, Yuanzhe
collection	PubMed
description	BACKGROUND: Performing statistical tests is an important step in analyzing genome-wide datasets for detecting genomic features differentially expressed between conditions. Each type of statistical test has its own advantages in characterizing certain aspects of differences between population means and often assumes a relatively simple data distribution (e.g., Gaussian, Poisson, negative binomial, etc.), which may not be well met by the datasets of interest. Making insufficient distributional assumptions can lead to inferior results when dealing with complex differential expression patterns. RESULTS: We propose to capture differential expression information more comprehensively by integrating multiple test statistics, each of which has relatively limited capacity to summarize the observed differential expression information. This work addresses a general application scenario, in which users want to detect as many as DEFs while requiring the false discovery rate (FDR) to be lower than a cut-off. We treat each test statistic as a basic attribute, and model the detection of differentially expressed genomic features as learning a discriminant boundary in a multi-dimensional space of basic attributes. We mathematically formulated our goal as a constrained optimization problem aiming to maximize discoveries satisfying a user-defined FDR. An effective algorithm, Discriminant-Cut, has been developed to solve an instantiation of this problem. Extensive comparisons of Discriminant-Cut with 13 existing methods were carried out to demonstrate its robustness and effectiveness. CONCLUSIONS: We have developed a novel machine learning methodology for robust differential expression analysis, which can be a new avenue to significantly advance research on large-scale differential expression analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1386-x) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5168810
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-51688102016-12-23 Robust differential expression analysis by learning discriminant boundary in multi-dimensional space of statistical attributes Bei, Yuanzhe Hong, Pengyu BMC Bioinformatics Methodology Article BACKGROUND: Performing statistical tests is an important step in analyzing genome-wide datasets for detecting genomic features differentially expressed between conditions. Each type of statistical test has its own advantages in characterizing certain aspects of differences between population means and often assumes a relatively simple data distribution (e.g., Gaussian, Poisson, negative binomial, etc.), which may not be well met by the datasets of interest. Making insufficient distributional assumptions can lead to inferior results when dealing with complex differential expression patterns. RESULTS: We propose to capture differential expression information more comprehensively by integrating multiple test statistics, each of which has relatively limited capacity to summarize the observed differential expression information. This work addresses a general application scenario, in which users want to detect as many as DEFs while requiring the false discovery rate (FDR) to be lower than a cut-off. We treat each test statistic as a basic attribute, and model the detection of differentially expressed genomic features as learning a discriminant boundary in a multi-dimensional space of basic attributes. We mathematically formulated our goal as a constrained optimization problem aiming to maximize discoveries satisfying a user-defined FDR. An effective algorithm, Discriminant-Cut, has been developed to solve an instantiation of this problem. Extensive comparisons of Discriminant-Cut with 13 existing methods were carried out to demonstrate its robustness and effectiveness. CONCLUSIONS: We have developed a novel machine learning methodology for robust differential expression analysis, which can be a new avenue to significantly advance research on large-scale differential expression analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1386-x) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-19 /pmc/articles/PMC5168810/ /pubmed/27993137 http://dx.doi.org/10.1186/s12859-016-1386-x Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Bei, Yuanzhe Hong, Pengyu Robust differential expression analysis by learning discriminant boundary in multi-dimensional space of statistical attributes
title	Robust differential expression analysis by learning discriminant boundary in multi-dimensional space of statistical attributes
title_full	Robust differential expression analysis by learning discriminant boundary in multi-dimensional space of statistical attributes
title_fullStr	Robust differential expression analysis by learning discriminant boundary in multi-dimensional space of statistical attributes
title_full_unstemmed	Robust differential expression analysis by learning discriminant boundary in multi-dimensional space of statistical attributes
title_short	Robust differential expression analysis by learning discriminant boundary in multi-dimensional space of statistical attributes
title_sort	robust differential expression analysis by learning discriminant boundary in multi-dimensional space of statistical attributes
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5168810/ https://www.ncbi.nlm.nih.gov/pubmed/27993137 http://dx.doi.org/10.1186/s12859-016-1386-x
work_keys_str_mv	AT beiyuanzhe robustdifferentialexpressionanalysisbylearningdiscriminantboundaryinmultidimensionalspaceofstatisticalattributes AT hongpengyu robustdifferentialexpressionanalysisbylearningdiscriminantboundaryinmultidimensionalspaceofstatisticalattributes

Robust differential expression analysis by learning discriminant boundary in multi-dimensional space of statistical attributes

Ejemplares similares