Cargando…

Assembly-free rapid differential gene expression analysis in non-model organisms using DNA-protein alignment

BACKGROUND: RNA-seq is being increasingly adopted for gene expression studies in a panoply of non-model organisms, with applications spanning the fields of agriculture, aquaculture, ecology, and environment. For organisms that lack a well-annotated reference genome or transcriptome, a conventional R...

Descripción completa

Detalles Bibliográficos
Autores principales: Shrestha, Anish M.S., B. Guiao, Joyce Emlyn, R. Santiago, Kyle Christian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8815227/
https://www.ncbi.nlm.nih.gov/pubmed/35120462
http://dx.doi.org/10.1186/s12864-021-08278-7
_version_ 1784645240105205760
author Shrestha, Anish M.S.
B. Guiao, Joyce Emlyn
R. Santiago, Kyle Christian
author_facet Shrestha, Anish M.S.
B. Guiao, Joyce Emlyn
R. Santiago, Kyle Christian
author_sort Shrestha, Anish M.S.
collection PubMed
description BACKGROUND: RNA-seq is being increasingly adopted for gene expression studies in a panoply of non-model organisms, with applications spanning the fields of agriculture, aquaculture, ecology, and environment. For organisms that lack a well-annotated reference genome or transcriptome, a conventional RNA-seq data analysis workflow requires constructing a de-novo transcriptome assembly and annotating it against a high-confidence protein database. The assembly serves as a reference for read mapping, and the annotation is necessary for functional analysis of genes found to be differentially expressed. However, assembly is computationally expensive. It is also prone to errors that impact expression analysis, especially since sequencing depth is typically much lower for expression studies than for transcript discovery. RESULTS: We propose a shortcut, in which we obtain counts for differential expression analysis by directly aligning RNA-seq reads to the high-confidence proteome that would have been otherwise used for annotation. By avoiding assembly, we drastically cut down computational costs – the running time on a typical dataset improves from the order of tens of hours to under half an hour, and the memory requirement is reduced from the order of tens of Gbytes to tens of Mbytes. We show through experiments on simulated and real data that our pipeline not only reduces computational costs, but has higher sensitivity and precision than a typical assembly-based pipeline. A Snakemake implementation of our workflow is available at: https://bitbucket.org/project_samar/samar. CONCLUSIONS: The flip side of RNA-seq becoming accessible to even modestly resourced labs has been that the time, labor, and infrastructure cost of bioinformatics analysis has become a bottleneck. Assembly is one such resource-hungry process, and we show here that it can be avoided for quick and easy, yet more sensitive and precise, differential gene expression analysis in non-model organisms. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-021-08278-7).
format Online
Article
Text
id pubmed-8815227
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-88152272022-02-07 Assembly-free rapid differential gene expression analysis in non-model organisms using DNA-protein alignment Shrestha, Anish M.S. B. Guiao, Joyce Emlyn R. Santiago, Kyle Christian BMC Genomics Software BACKGROUND: RNA-seq is being increasingly adopted for gene expression studies in a panoply of non-model organisms, with applications spanning the fields of agriculture, aquaculture, ecology, and environment. For organisms that lack a well-annotated reference genome or transcriptome, a conventional RNA-seq data analysis workflow requires constructing a de-novo transcriptome assembly and annotating it against a high-confidence protein database. The assembly serves as a reference for read mapping, and the annotation is necessary for functional analysis of genes found to be differentially expressed. However, assembly is computationally expensive. It is also prone to errors that impact expression analysis, especially since sequencing depth is typically much lower for expression studies than for transcript discovery. RESULTS: We propose a shortcut, in which we obtain counts for differential expression analysis by directly aligning RNA-seq reads to the high-confidence proteome that would have been otherwise used for annotation. By avoiding assembly, we drastically cut down computational costs – the running time on a typical dataset improves from the order of tens of hours to under half an hour, and the memory requirement is reduced from the order of tens of Gbytes to tens of Mbytes. We show through experiments on simulated and real data that our pipeline not only reduces computational costs, but has higher sensitivity and precision than a typical assembly-based pipeline. A Snakemake implementation of our workflow is available at: https://bitbucket.org/project_samar/samar. CONCLUSIONS: The flip side of RNA-seq becoming accessible to even modestly resourced labs has been that the time, labor, and infrastructure cost of bioinformatics analysis has become a bottleneck. Assembly is one such resource-hungry process, and we show here that it can be avoided for quick and easy, yet more sensitive and precise, differential gene expression analysis in non-model organisms. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-021-08278-7). BioMed Central 2022-02-04 /pmc/articles/PMC8815227/ /pubmed/35120462 http://dx.doi.org/10.1186/s12864-021-08278-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Shrestha, Anish M.S.
B. Guiao, Joyce Emlyn
R. Santiago, Kyle Christian
Assembly-free rapid differential gene expression analysis in non-model organisms using DNA-protein alignment
title Assembly-free rapid differential gene expression analysis in non-model organisms using DNA-protein alignment
title_full Assembly-free rapid differential gene expression analysis in non-model organisms using DNA-protein alignment
title_fullStr Assembly-free rapid differential gene expression analysis in non-model organisms using DNA-protein alignment
title_full_unstemmed Assembly-free rapid differential gene expression analysis in non-model organisms using DNA-protein alignment
title_short Assembly-free rapid differential gene expression analysis in non-model organisms using DNA-protein alignment
title_sort assembly-free rapid differential gene expression analysis in non-model organisms using dna-protein alignment
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8815227/
https://www.ncbi.nlm.nih.gov/pubmed/35120462
http://dx.doi.org/10.1186/s12864-021-08278-7
work_keys_str_mv AT shresthaanishms assemblyfreerapiddifferentialgeneexpressionanalysisinnonmodelorganismsusingdnaproteinalignment
AT bguiaojoyceemlyn assemblyfreerapiddifferentialgeneexpressionanalysisinnonmodelorganismsusingdnaproteinalignment
AT rsantiagokylechristian assemblyfreerapiddifferentialgeneexpressionanalysisinnonmodelorganismsusingdnaproteinalignment