Cargando…

aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data

BACKGROUND: Data normalization and identification of significant differential expression represent crucial steps in RNA-Seq analysis. Many available tools rely on assumptions that are often not met by real data, including the common assumption of symmetrical distribution of up- and down-regulated ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Wentao, Rosenstiel, Philip, Schulenburg, Hinrich
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6509820/
https://www.ncbi.nlm.nih.gov/pubmed/31077153
http://dx.doi.org/10.1186/s12864-019-5686-1
_version_ 1783417324387172352
author Yang, Wentao
Rosenstiel, Philip
Schulenburg, Hinrich
author_facet Yang, Wentao
Rosenstiel, Philip
Schulenburg, Hinrich
author_sort Yang, Wentao
collection PubMed
description BACKGROUND: Data normalization and identification of significant differential expression represent crucial steps in RNA-Seq analysis. Many available tools rely on assumptions that are often not met by real data, including the common assumption of symmetrical distribution of up- and down-regulated genes, the presence of only few differentially expressed genes and/or few outliers. Moreover, the cut-off for selecting significantly differentially expressed genes for further downstream analysis often depend on arbitrary choices. RESULTS: We here introduce a new tool for estimating differential expression in noisy real-life data. It employs a novel normalization procedure (qtotal), which takes account of the overall distribution of read counts for data standardization enhancing reliable identification of differential gene expression, especially in case of asymmetrical distributions of up- and downregulated genes. The tool then introduces a polynomial algorithm (aFold) to model the uncertainty of read counts across treatments and genes. We extensively benchmark aFold on a variety of simulated and validated real-life data sets (e.g. ABRF, SEQC and MAQC-II) and show a higher ability to correctly identify differentially expressed genes under most tested conditions. aFold infers fold change values that are comparable across experiments, thereby facilitating data clustering, visualization, and other downstream applications. CONCLUSIONS: We here present a new transcriptomics analysis tool that includes both a data normalization method and a differential expression analysis approach. The new tool is shown to enhance reliable identification of significant differential expression across distinct data distributions. It outcompetes alternative procedures in case of asymmetrical distributions of up- versus down-regulated genes and also the presence of outliers, all common to real data sets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5686-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6509820
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65098202019-06-05 aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data Yang, Wentao Rosenstiel, Philip Schulenburg, Hinrich BMC Genomics Software BACKGROUND: Data normalization and identification of significant differential expression represent crucial steps in RNA-Seq analysis. Many available tools rely on assumptions that are often not met by real data, including the common assumption of symmetrical distribution of up- and down-regulated genes, the presence of only few differentially expressed genes and/or few outliers. Moreover, the cut-off for selecting significantly differentially expressed genes for further downstream analysis often depend on arbitrary choices. RESULTS: We here introduce a new tool for estimating differential expression in noisy real-life data. It employs a novel normalization procedure (qtotal), which takes account of the overall distribution of read counts for data standardization enhancing reliable identification of differential gene expression, especially in case of asymmetrical distributions of up- and downregulated genes. The tool then introduces a polynomial algorithm (aFold) to model the uncertainty of read counts across treatments and genes. We extensively benchmark aFold on a variety of simulated and validated real-life data sets (e.g. ABRF, SEQC and MAQC-II) and show a higher ability to correctly identify differentially expressed genes under most tested conditions. aFold infers fold change values that are comparable across experiments, thereby facilitating data clustering, visualization, and other downstream applications. CONCLUSIONS: We here present a new transcriptomics analysis tool that includes both a data normalization method and a differential expression analysis approach. The new tool is shown to enhance reliable identification of significant differential expression across distinct data distributions. It outcompetes alternative procedures in case of asymmetrical distributions of up- versus down-regulated genes and also the presence of outliers, all common to real data sets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5686-1) contains supplementary material, which is available to authorized users. BioMed Central 2019-05-10 /pmc/articles/PMC6509820/ /pubmed/31077153 http://dx.doi.org/10.1186/s12864-019-5686-1 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Yang, Wentao
Rosenstiel, Philip
Schulenburg, Hinrich
aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data
title aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data
title_full aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data
title_fullStr aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data
title_full_unstemmed aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data
title_short aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data
title_sort afold – using polynomial uncertainty modelling for differential gene expression estimation from rna sequencing data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6509820/
https://www.ncbi.nlm.nih.gov/pubmed/31077153
http://dx.doi.org/10.1186/s12864-019-5686-1
work_keys_str_mv AT yangwentao afoldusingpolynomialuncertaintymodellingfordifferentialgeneexpressionestimationfromrnasequencingdata
AT rosenstielphilip afoldusingpolynomialuncertaintymodellingfordifferentialgeneexpressionestimationfromrnasequencingdata
AT schulenburghinrich afoldusingpolynomialuncertaintymodellingfordifferentialgeneexpressionestimationfromrnasequencingdata