Cargando…

aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data

BACKGROUND: Data normalization and identification of significant differential expression represent crucial steps in RNA-Seq analysis. Many available tools rely on assumptions that are often not met by real data, including the common assumption of symmetrical distribution of up- and down-regulated ge...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yang, Wentao, Rosenstiel, Philip, Schulenburg, Hinrich
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6509820/ https://www.ncbi.nlm.nih.gov/pubmed/31077153 http://dx.doi.org/10.1186/s12864-019-5686-1

_version_	1783417324387172352
author	Yang, Wentao Rosenstiel, Philip Schulenburg, Hinrich
author_facet	Yang, Wentao Rosenstiel, Philip Schulenburg, Hinrich
author_sort	Yang, Wentao
collection	PubMed
description	BACKGROUND: Data normalization and identification of significant differential expression represent crucial steps in RNA-Seq analysis. Many available tools rely on assumptions that are often not met by real data, including the common assumption of symmetrical distribution of up- and down-regulated genes, the presence of only few differentially expressed genes and/or few outliers. Moreover, the cut-off for selecting significantly differentially expressed genes for further downstream analysis often depend on arbitrary choices. RESULTS: We here introduce a new tool for estimating differential expression in noisy real-life data. It employs a novel normalization procedure (qtotal), which takes account of the overall distribution of read counts for data standardization enhancing reliable identification of differential gene expression, especially in case of asymmetrical distributions of up- and downregulated genes. The tool then introduces a polynomial algorithm (aFold) to model the uncertainty of read counts across treatments and genes. We extensively benchmark aFold on a variety of simulated and validated real-life data sets (e.g. ABRF, SEQC and MAQC-II) and show a higher ability to correctly identify differentially expressed genes under most tested conditions. aFold infers fold change values that are comparable across experiments, thereby facilitating data clustering, visualization, and other downstream applications. CONCLUSIONS: We here present a new transcriptomics analysis tool that includes both a data normalization method and a differential expression analysis approach. The new tool is shown to enhance reliable identification of significant differential expression across distinct data distributions. It outcompetes alternative procedures in case of asymmetrical distributions of up- versus down-regulated genes and also the presence of outliers, all common to real data sets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5686-1) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6509820
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-65098202019-06-05 aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data Yang, Wentao Rosenstiel, Philip Schulenburg, Hinrich BMC Genomics Software BACKGROUND: Data normalization and identification of significant differential expression represent crucial steps in RNA-Seq analysis. Many available tools rely on assumptions that are often not met by real data, including the common assumption of symmetrical distribution of up- and down-regulated genes, the presence of only few differentially expressed genes and/or few outliers. Moreover, the cut-off for selecting significantly differentially expressed genes for further downstream analysis often depend on arbitrary choices. RESULTS: We here introduce a new tool for estimating differential expression in noisy real-life data. It employs a novel normalization procedure (qtotal), which takes account of the overall distribution of read counts for data standardization enhancing reliable identification of differential gene expression, especially in case of asymmetrical distributions of up- and downregulated genes. The tool then introduces a polynomial algorithm (aFold) to model the uncertainty of read counts across treatments and genes. We extensively benchmark aFold on a variety of simulated and validated real-life data sets (e.g. ABRF, SEQC and MAQC-II) and show a higher ability to correctly identify differentially expressed genes under most tested conditions. aFold infers fold change values that are comparable across experiments, thereby facilitating data clustering, visualization, and other downstream applications. CONCLUSIONS: We here present a new transcriptomics analysis tool that includes both a data normalization method and a differential expression analysis approach. The new tool is shown to enhance reliable identification of significant differential expression across distinct data distributions. It outcompetes alternative procedures in case of asymmetrical distributions of up- versus down-regulated genes and also the presence of outliers, all common to real data sets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5686-1) contains supplementary material, which is available to authorized users. BioMed Central 2019-05-10 /pmc/articles/PMC6509820/ /pubmed/31077153 http://dx.doi.org/10.1186/s12864-019-5686-1 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Yang, Wentao Rosenstiel, Philip Schulenburg, Hinrich aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data
title	aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data
title_full	aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data
title_fullStr	aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data
title_full_unstemmed	aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data
title_short	aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data
title_sort	afold – using polynomial uncertainty modelling for differential gene expression estimation from rna sequencing data
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6509820/ https://www.ncbi.nlm.nih.gov/pubmed/31077153 http://dx.doi.org/10.1186/s12864-019-5686-1
work_keys_str_mv	AT yangwentao afoldusingpolynomialuncertaintymodellingfordifferentialgeneexpressionestimationfromrnasequencingdata AT rosenstielphilip afoldusingpolynomialuncertaintymodellingfordifferentialgeneexpressionestimationfromrnasequencingdata AT schulenburghinrich afoldusingpolynomialuncertaintymodellingfordifferentialgeneexpressionestimationfromrnasequencingdata

aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data

Ejemplares similares