Cargando…
aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data
BACKGROUND: Data normalization and identification of significant differential expression represent crucial steps in RNA-Seq analysis. Many available tools rely on assumptions that are often not met by real data, including the common assumption of symmetrical distribution of up- and down-regulated ge...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6509820/ https://www.ncbi.nlm.nih.gov/pubmed/31077153 http://dx.doi.org/10.1186/s12864-019-5686-1 |
_version_ | 1783417324387172352 |
---|---|
author | Yang, Wentao Rosenstiel, Philip Schulenburg, Hinrich |
author_facet | Yang, Wentao Rosenstiel, Philip Schulenburg, Hinrich |
author_sort | Yang, Wentao |
collection | PubMed |
description | BACKGROUND: Data normalization and identification of significant differential expression represent crucial steps in RNA-Seq analysis. Many available tools rely on assumptions that are often not met by real data, including the common assumption of symmetrical distribution of up- and down-regulated genes, the presence of only few differentially expressed genes and/or few outliers. Moreover, the cut-off for selecting significantly differentially expressed genes for further downstream analysis often depend on arbitrary choices. RESULTS: We here introduce a new tool for estimating differential expression in noisy real-life data. It employs a novel normalization procedure (qtotal), which takes account of the overall distribution of read counts for data standardization enhancing reliable identification of differential gene expression, especially in case of asymmetrical distributions of up- and downregulated genes. The tool then introduces a polynomial algorithm (aFold) to model the uncertainty of read counts across treatments and genes. We extensively benchmark aFold on a variety of simulated and validated real-life data sets (e.g. ABRF, SEQC and MAQC-II) and show a higher ability to correctly identify differentially expressed genes under most tested conditions. aFold infers fold change values that are comparable across experiments, thereby facilitating data clustering, visualization, and other downstream applications. CONCLUSIONS: We here present a new transcriptomics analysis tool that includes both a data normalization method and a differential expression analysis approach. The new tool is shown to enhance reliable identification of significant differential expression across distinct data distributions. It outcompetes alternative procedures in case of asymmetrical distributions of up- versus down-regulated genes and also the presence of outliers, all common to real data sets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5686-1) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6509820 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-65098202019-06-05 aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data Yang, Wentao Rosenstiel, Philip Schulenburg, Hinrich BMC Genomics Software BACKGROUND: Data normalization and identification of significant differential expression represent crucial steps in RNA-Seq analysis. Many available tools rely on assumptions that are often not met by real data, including the common assumption of symmetrical distribution of up- and down-regulated genes, the presence of only few differentially expressed genes and/or few outliers. Moreover, the cut-off for selecting significantly differentially expressed genes for further downstream analysis often depend on arbitrary choices. RESULTS: We here introduce a new tool for estimating differential expression in noisy real-life data. It employs a novel normalization procedure (qtotal), which takes account of the overall distribution of read counts for data standardization enhancing reliable identification of differential gene expression, especially in case of asymmetrical distributions of up- and downregulated genes. The tool then introduces a polynomial algorithm (aFold) to model the uncertainty of read counts across treatments and genes. We extensively benchmark aFold on a variety of simulated and validated real-life data sets (e.g. ABRF, SEQC and MAQC-II) and show a higher ability to correctly identify differentially expressed genes under most tested conditions. aFold infers fold change values that are comparable across experiments, thereby facilitating data clustering, visualization, and other downstream applications. CONCLUSIONS: We here present a new transcriptomics analysis tool that includes both a data normalization method and a differential expression analysis approach. The new tool is shown to enhance reliable identification of significant differential expression across distinct data distributions. It outcompetes alternative procedures in case of asymmetrical distributions of up- versus down-regulated genes and also the presence of outliers, all common to real data sets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5686-1) contains supplementary material, which is available to authorized users. BioMed Central 2019-05-10 /pmc/articles/PMC6509820/ /pubmed/31077153 http://dx.doi.org/10.1186/s12864-019-5686-1 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Yang, Wentao Rosenstiel, Philip Schulenburg, Hinrich aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data |
title | aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data |
title_full | aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data |
title_fullStr | aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data |
title_full_unstemmed | aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data |
title_short | aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data |
title_sort | afold – using polynomial uncertainty modelling for differential gene expression estimation from rna sequencing data |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6509820/ https://www.ncbi.nlm.nih.gov/pubmed/31077153 http://dx.doi.org/10.1186/s12864-019-5686-1 |
work_keys_str_mv | AT yangwentao afoldusingpolynomialuncertaintymodellingfordifferentialgeneexpressionestimationfromrnasequencingdata AT rosenstielphilip afoldusingpolynomialuncertaintymodellingfordifferentialgeneexpressionestimationfromrnasequencingdata AT schulenburghinrich afoldusingpolynomialuncertaintymodellingfordifferentialgeneexpressionestimationfromrnasequencingdata |