Cargando…
Ultrafast functional profiling of RNA-seq data for nonmodel organisms
Computational time and cost remain a major bottleneck for RNA-seq data analysis of nonmodel organisms without reference genomes. To address this challenge, we have developed Seq2Fun, a novel, all-in-one, ultrafast tool to directly perform functional quantification of RNA-seq reads without transcript...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8015844/ https://www.ncbi.nlm.nih.gov/pubmed/33731361 http://dx.doi.org/10.1101/gr.269894.120 |
_version_ | 1783673757856956416 |
---|---|
author | Liu, Peng Ewald, Jessica Galvez, Jose Hector Head, Jessica Crump, Doug Bourque, Guillaume Basu, Niladri Xia, Jianguo |
author_facet | Liu, Peng Ewald, Jessica Galvez, Jose Hector Head, Jessica Crump, Doug Bourque, Guillaume Basu, Niladri Xia, Jianguo |
author_sort | Liu, Peng |
collection | PubMed |
description | Computational time and cost remain a major bottleneck for RNA-seq data analysis of nonmodel organisms without reference genomes. To address this challenge, we have developed Seq2Fun, a novel, all-in-one, ultrafast tool to directly perform functional quantification of RNA-seq reads without transcriptome de novo assembly. The pipeline starts with raw read quality control: sequencing error correction, removing poly(A) tails, and joining overlapped paired-end reads. It then conducts a DNA-to-protein search by translating each read into all possible amino acid fragments and subsequently identifies possible homologous sequences in a well-curated protein database. Finally, the pipeline generates several informative outputs including gene abundance tables, pathway and species hit tables, an HTML report to visualize the results, and an output of clean reads annotated with mapped genes ready for downstream analysis. Seq2Fun does not have any intermediate steps of file writing and loading, making I/O very efficient. Seq2Fun is written in C++ and can run on a personal computer with a limited number of CPUs and memory. It can process >2,000,000 reads/min and is >120 times faster than conventional workflows based on de novo assembly, while maintaining high accuracy in our various test data sets. |
format | Online Article Text |
id | pubmed-8015844 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Cold Spring Harbor Laboratory Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-80158442021-04-21 Ultrafast functional profiling of RNA-seq data for nonmodel organisms Liu, Peng Ewald, Jessica Galvez, Jose Hector Head, Jessica Crump, Doug Bourque, Guillaume Basu, Niladri Xia, Jianguo Genome Res Method Computational time and cost remain a major bottleneck for RNA-seq data analysis of nonmodel organisms without reference genomes. To address this challenge, we have developed Seq2Fun, a novel, all-in-one, ultrafast tool to directly perform functional quantification of RNA-seq reads without transcriptome de novo assembly. The pipeline starts with raw read quality control: sequencing error correction, removing poly(A) tails, and joining overlapped paired-end reads. It then conducts a DNA-to-protein search by translating each read into all possible amino acid fragments and subsequently identifies possible homologous sequences in a well-curated protein database. Finally, the pipeline generates several informative outputs including gene abundance tables, pathway and species hit tables, an HTML report to visualize the results, and an output of clean reads annotated with mapped genes ready for downstream analysis. Seq2Fun does not have any intermediate steps of file writing and loading, making I/O very efficient. Seq2Fun is written in C++ and can run on a personal computer with a limited number of CPUs and memory. It can process >2,000,000 reads/min and is >120 times faster than conventional workflows based on de novo assembly, while maintaining high accuracy in our various test data sets. Cold Spring Harbor Laboratory Press 2021-04 /pmc/articles/PMC8015844/ /pubmed/33731361 http://dx.doi.org/10.1101/gr.269894.120 Text en © 2021 Liu et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/. |
spellingShingle | Method Liu, Peng Ewald, Jessica Galvez, Jose Hector Head, Jessica Crump, Doug Bourque, Guillaume Basu, Niladri Xia, Jianguo Ultrafast functional profiling of RNA-seq data for nonmodel organisms |
title | Ultrafast functional profiling of RNA-seq data for nonmodel organisms |
title_full | Ultrafast functional profiling of RNA-seq data for nonmodel organisms |
title_fullStr | Ultrafast functional profiling of RNA-seq data for nonmodel organisms |
title_full_unstemmed | Ultrafast functional profiling of RNA-seq data for nonmodel organisms |
title_short | Ultrafast functional profiling of RNA-seq data for nonmodel organisms |
title_sort | ultrafast functional profiling of rna-seq data for nonmodel organisms |
topic | Method |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8015844/ https://www.ncbi.nlm.nih.gov/pubmed/33731361 http://dx.doi.org/10.1101/gr.269894.120 |
work_keys_str_mv | AT liupeng ultrafastfunctionalprofilingofrnaseqdatafornonmodelorganisms AT ewaldjessica ultrafastfunctionalprofilingofrnaseqdatafornonmodelorganisms AT galvezjosehector ultrafastfunctionalprofilingofrnaseqdatafornonmodelorganisms AT headjessica ultrafastfunctionalprofilingofrnaseqdatafornonmodelorganisms AT crumpdoug ultrafastfunctionalprofilingofrnaseqdatafornonmodelorganisms AT bourqueguillaume ultrafastfunctionalprofilingofrnaseqdatafornonmodelorganisms AT basuniladri ultrafastfunctionalprofilingofrnaseqdatafornonmodelorganisms AT xiajianguo ultrafastfunctionalprofilingofrnaseqdatafornonmodelorganisms |