Cargando…
Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures
RNA-seq is a modular experimental and computational approach aiming in identifying and quantifying RNA molecules. The modularity of the RNA-seq technology enables adaptation of the protocol to develop new ways to explore RNA biology, but this modularity also brings forth the importance of methodolog...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671328/ https://www.ncbi.nlm.nih.gov/pubmed/33575596 http://dx.doi.org/10.1093/nargab/lqaa043 |
_version_ | 1783610909013311488 |
---|---|
author | Simoneau, Joël Gosselin, Ryan Scott, Michelle S |
author_facet | Simoneau, Joël Gosselin, Ryan Scott, Michelle S |
author_sort | Simoneau, Joël |
collection | PubMed |
description | RNA-seq is a modular experimental and computational approach aiming in identifying and quantifying RNA molecules. The modularity of the RNA-seq technology enables adaptation of the protocol to develop new ways to explore RNA biology, but this modularity also brings forth the importance of methodological thoroughness. Liberty of approach comes with the responsibility of choices, and such choices must be informed. Here, we present an approach that identifies gene group-specific quantification biases in current RNA-seq software and references by processing datasets using diverse RNA-seq computational pipelines, and by decomposing these expression datasets with an independent component analysis matrix factorization method. By exploring the RNA-seq pipeline using this systemic approach, we identify genome annotations as a design choice that affects to the same extent quantification results as does the choice of aligners and quantifiers. We also show that the different choices in RNA-seq methodology are not independent, identifying interactions between genome annotations and quantification software. Genes were mainly affected by differences in their sequence, by overlapping genes and genes with similar sequence. Our approach offers an explanation for the observed biases by identifying the common features used differently by the software and references, therefore providing leads for the betterment of RNA-seq methodology. |
format | Online Article Text |
id | pubmed-7671328 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-76713282021-02-10 Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures Simoneau, Joël Gosselin, Ryan Scott, Michelle S NAR Genom Bioinform Standard Article RNA-seq is a modular experimental and computational approach aiming in identifying and quantifying RNA molecules. The modularity of the RNA-seq technology enables adaptation of the protocol to develop new ways to explore RNA biology, but this modularity also brings forth the importance of methodological thoroughness. Liberty of approach comes with the responsibility of choices, and such choices must be informed. Here, we present an approach that identifies gene group-specific quantification biases in current RNA-seq software and references by processing datasets using diverse RNA-seq computational pipelines, and by decomposing these expression datasets with an independent component analysis matrix factorization method. By exploring the RNA-seq pipeline using this systemic approach, we identify genome annotations as a design choice that affects to the same extent quantification results as does the choice of aligners and quantifiers. We also show that the different choices in RNA-seq methodology are not independent, identifying interactions between genome annotations and quantification software. Genes were mainly affected by differences in their sequence, by overlapping genes and genes with similar sequence. Our approach offers an explanation for the observed biases by identifying the common features used differently by the software and references, therefore providing leads for the betterment of RNA-seq methodology. Oxford University Press 2020-06-29 /pmc/articles/PMC7671328/ /pubmed/33575596 http://dx.doi.org/10.1093/nargab/lqaa043 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Standard Article Simoneau, Joël Gosselin, Ryan Scott, Michelle S Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures |
title | Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures |
title_full | Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures |
title_fullStr | Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures |
title_full_unstemmed | Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures |
title_short | Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures |
title_sort | factorial study of the rna-seq computational workflow identifies biases as technical gene signatures |
topic | Standard Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671328/ https://www.ncbi.nlm.nih.gov/pubmed/33575596 http://dx.doi.org/10.1093/nargab/lqaa043 |
work_keys_str_mv | AT simoneaujoel factorialstudyofthernaseqcomputationalworkflowidentifiesbiasesastechnicalgenesignatures AT gosselinryan factorialstudyofthernaseqcomputationalworkflowidentifiesbiasesastechnicalgenesignatures AT scottmichelles factorialstudyofthernaseqcomputationalworkflowidentifiesbiasesastechnicalgenesignatures |