Cargando…

Ultrafast functional profiling of RNA-seq data for nonmodel organisms

Computational time and cost remain a major bottleneck for RNA-seq data analysis of nonmodel organisms without reference genomes. To address this challenge, we have developed Seq2Fun, a novel, all-in-one, ultrafast tool to directly perform functional quantification of RNA-seq reads without transcript...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Peng, Ewald, Jessica, Galvez, Jose Hector, Head, Jessica, Crump, Doug, Bourque, Guillaume, Basu, Niladri, Xia, Jianguo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8015844/
https://www.ncbi.nlm.nih.gov/pubmed/33731361
http://dx.doi.org/10.1101/gr.269894.120
_version_ 1783673757856956416
author Liu, Peng
Ewald, Jessica
Galvez, Jose Hector
Head, Jessica
Crump, Doug
Bourque, Guillaume
Basu, Niladri
Xia, Jianguo
author_facet Liu, Peng
Ewald, Jessica
Galvez, Jose Hector
Head, Jessica
Crump, Doug
Bourque, Guillaume
Basu, Niladri
Xia, Jianguo
author_sort Liu, Peng
collection PubMed
description Computational time and cost remain a major bottleneck for RNA-seq data analysis of nonmodel organisms without reference genomes. To address this challenge, we have developed Seq2Fun, a novel, all-in-one, ultrafast tool to directly perform functional quantification of RNA-seq reads without transcriptome de novo assembly. The pipeline starts with raw read quality control: sequencing error correction, removing poly(A) tails, and joining overlapped paired-end reads. It then conducts a DNA-to-protein search by translating each read into all possible amino acid fragments and subsequently identifies possible homologous sequences in a well-curated protein database. Finally, the pipeline generates several informative outputs including gene abundance tables, pathway and species hit tables, an HTML report to visualize the results, and an output of clean reads annotated with mapped genes ready for downstream analysis. Seq2Fun does not have any intermediate steps of file writing and loading, making I/O very efficient. Seq2Fun is written in C++ and can run on a personal computer with a limited number of CPUs and memory. It can process >2,000,000 reads/min and is >120 times faster than conventional workflows based on de novo assembly, while maintaining high accuracy in our various test data sets.
format Online
Article
Text
id pubmed-8015844
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-80158442021-04-21 Ultrafast functional profiling of RNA-seq data for nonmodel organisms Liu, Peng Ewald, Jessica Galvez, Jose Hector Head, Jessica Crump, Doug Bourque, Guillaume Basu, Niladri Xia, Jianguo Genome Res Method Computational time and cost remain a major bottleneck for RNA-seq data analysis of nonmodel organisms without reference genomes. To address this challenge, we have developed Seq2Fun, a novel, all-in-one, ultrafast tool to directly perform functional quantification of RNA-seq reads without transcriptome de novo assembly. The pipeline starts with raw read quality control: sequencing error correction, removing poly(A) tails, and joining overlapped paired-end reads. It then conducts a DNA-to-protein search by translating each read into all possible amino acid fragments and subsequently identifies possible homologous sequences in a well-curated protein database. Finally, the pipeline generates several informative outputs including gene abundance tables, pathway and species hit tables, an HTML report to visualize the results, and an output of clean reads annotated with mapped genes ready for downstream analysis. Seq2Fun does not have any intermediate steps of file writing and loading, making I/O very efficient. Seq2Fun is written in C++ and can run on a personal computer with a limited number of CPUs and memory. It can process >2,000,000 reads/min and is >120 times faster than conventional workflows based on de novo assembly, while maintaining high accuracy in our various test data sets. Cold Spring Harbor Laboratory Press 2021-04 /pmc/articles/PMC8015844/ /pubmed/33731361 http://dx.doi.org/10.1101/gr.269894.120 Text en © 2021 Liu et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Method
Liu, Peng
Ewald, Jessica
Galvez, Jose Hector
Head, Jessica
Crump, Doug
Bourque, Guillaume
Basu, Niladri
Xia, Jianguo
Ultrafast functional profiling of RNA-seq data for nonmodel organisms
title Ultrafast functional profiling of RNA-seq data for nonmodel organisms
title_full Ultrafast functional profiling of RNA-seq data for nonmodel organisms
title_fullStr Ultrafast functional profiling of RNA-seq data for nonmodel organisms
title_full_unstemmed Ultrafast functional profiling of RNA-seq data for nonmodel organisms
title_short Ultrafast functional profiling of RNA-seq data for nonmodel organisms
title_sort ultrafast functional profiling of rna-seq data for nonmodel organisms
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8015844/
https://www.ncbi.nlm.nih.gov/pubmed/33731361
http://dx.doi.org/10.1101/gr.269894.120
work_keys_str_mv AT liupeng ultrafastfunctionalprofilingofrnaseqdatafornonmodelorganisms
AT ewaldjessica ultrafastfunctionalprofilingofrnaseqdatafornonmodelorganisms
AT galvezjosehector ultrafastfunctionalprofilingofrnaseqdatafornonmodelorganisms
AT headjessica ultrafastfunctionalprofilingofrnaseqdatafornonmodelorganisms
AT crumpdoug ultrafastfunctionalprofilingofrnaseqdatafornonmodelorganisms
AT bourqueguillaume ultrafastfunctionalprofilingofrnaseqdatafornonmodelorganisms
AT basuniladri ultrafastfunctionalprofilingofrnaseqdatafornonmodelorganisms
AT xiajianguo ultrafastfunctionalprofilingofrnaseqdatafornonmodelorganisms