Cargando…
A fuzzy method for RNA-Seq differential expression analysis in presence of multireads
BACKGROUND: When the reads obtained from high-throughput RNA sequencing are mapped against a reference database, a significant proportion of them - known as multireads - can map to more than one reference sequence. These multireads originate from gene duplications, repetitive regions or overlapping...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123383/ https://www.ncbi.nlm.nih.gov/pubmed/28185579 http://dx.doi.org/10.1186/s12859-016-1195-2 |
_version_ | 1782469725313302528 |
---|---|
author | Consiglio, Arianna Mencar, Corrado Grillo, Giorgio Marzano, Flaviana Caratozzolo, Mariano Francesco Liuni, Sabino |
author_facet | Consiglio, Arianna Mencar, Corrado Grillo, Giorgio Marzano, Flaviana Caratozzolo, Mariano Francesco Liuni, Sabino |
author_sort | Consiglio, Arianna |
collection | PubMed |
description | BACKGROUND: When the reads obtained from high-throughput RNA sequencing are mapped against a reference database, a significant proportion of them - known as multireads - can map to more than one reference sequence. These multireads originate from gene duplications, repetitive regions or overlapping genes. Removing the multireads from the mapping results, in RNA-Seq analyses, causes an underestimation of the read counts, while estimating the real read count can lead to false positives during the detection of differentially expressed sequences. RESULTS: We present an innovative approach to deal with multireads and evaluate differential expression events, entirely based on fuzzy set theory. Since multireads cause uncertainty in the estimation of read counts during gene expression computation, they can also influence the reliability of differential expression analysis results, by producing false positives. Our method manages the uncertainty in gene expression estimation by defining the fuzzy read counts and evaluates the possibility of a gene to be differentially expressed with three fuzzy concepts: over-expression, same-expression and under-expression. The output of the method is a list of differentially expressed genes enriched with information about the uncertainty of the results due to the multiread presence. We have tested the method on RNA-Seq data designed for case-control studies and we have compared the obtained results with other existing tools for read count estimation and differential expression analysis. CONCLUSIONS: The management of multireads with the use of fuzzy sets allows to obtain a list of differential expression events which takes in account the uncertainty in the results caused by the presence of multireads. Such additional information can be used by the biologists when they have to select the most relevant differential expression events to validate with laboratory assays. Our method can be used to compute reliable differential expression events and to highlight possible false positives in the lists of differentially expressed genes computed with other tools. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1195-2) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5123383 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-51233832016-12-08 A fuzzy method for RNA-Seq differential expression analysis in presence of multireads Consiglio, Arianna Mencar, Corrado Grillo, Giorgio Marzano, Flaviana Caratozzolo, Mariano Francesco Liuni, Sabino BMC Bioinformatics Research BACKGROUND: When the reads obtained from high-throughput RNA sequencing are mapped against a reference database, a significant proportion of them - known as multireads - can map to more than one reference sequence. These multireads originate from gene duplications, repetitive regions or overlapping genes. Removing the multireads from the mapping results, in RNA-Seq analyses, causes an underestimation of the read counts, while estimating the real read count can lead to false positives during the detection of differentially expressed sequences. RESULTS: We present an innovative approach to deal with multireads and evaluate differential expression events, entirely based on fuzzy set theory. Since multireads cause uncertainty in the estimation of read counts during gene expression computation, they can also influence the reliability of differential expression analysis results, by producing false positives. Our method manages the uncertainty in gene expression estimation by defining the fuzzy read counts and evaluates the possibility of a gene to be differentially expressed with three fuzzy concepts: over-expression, same-expression and under-expression. The output of the method is a list of differentially expressed genes enriched with information about the uncertainty of the results due to the multiread presence. We have tested the method on RNA-Seq data designed for case-control studies and we have compared the obtained results with other existing tools for read count estimation and differential expression analysis. CONCLUSIONS: The management of multireads with the use of fuzzy sets allows to obtain a list of differential expression events which takes in account the uncertainty in the results caused by the presence of multireads. Such additional information can be used by the biologists when they have to select the most relevant differential expression events to validate with laboratory assays. Our method can be used to compute reliable differential expression events and to highlight possible false positives in the lists of differentially expressed genes computed with other tools. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1195-2) contains supplementary material, which is available to authorized users. BioMed Central 2016-11-08 /pmc/articles/PMC5123383/ /pubmed/28185579 http://dx.doi.org/10.1186/s12859-016-1195-2 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Consiglio, Arianna Mencar, Corrado Grillo, Giorgio Marzano, Flaviana Caratozzolo, Mariano Francesco Liuni, Sabino A fuzzy method for RNA-Seq differential expression analysis in presence of multireads |
title | A fuzzy method for RNA-Seq differential expression analysis in presence of multireads |
title_full | A fuzzy method for RNA-Seq differential expression analysis in presence of multireads |
title_fullStr | A fuzzy method for RNA-Seq differential expression analysis in presence of multireads |
title_full_unstemmed | A fuzzy method for RNA-Seq differential expression analysis in presence of multireads |
title_short | A fuzzy method for RNA-Seq differential expression analysis in presence of multireads |
title_sort | fuzzy method for rna-seq differential expression analysis in presence of multireads |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123383/ https://www.ncbi.nlm.nih.gov/pubmed/28185579 http://dx.doi.org/10.1186/s12859-016-1195-2 |
work_keys_str_mv | AT consiglioarianna afuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads AT mencarcorrado afuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads AT grillogiorgio afuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads AT marzanoflaviana afuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads AT caratozzolomarianofrancesco afuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads AT liunisabino afuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads AT consiglioarianna fuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads AT mencarcorrado fuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads AT grillogiorgio fuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads AT marzanoflaviana fuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads AT caratozzolomarianofrancesco fuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads AT liunisabino fuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads |