Cargando…

A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA

Summary: In the context of metagenomics, we introduce a new approach to protein database search called PAUDA, which runs ∼10 000 times faster than BLASTX, while achieving about one-third of the assignment rate of reads to KEGG orthology groups, and producing gene and taxon abundance profiles that ar...

Descripción completa

Detalles Bibliográficos
Autores principales: Huson, Daniel H., Xie, Chao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866550/
https://www.ncbi.nlm.nih.gov/pubmed/23658416
http://dx.doi.org/10.1093/bioinformatics/btt254
_version_ 1782296178150342656
author Huson, Daniel H.
Xie, Chao
author_facet Huson, Daniel H.
Xie, Chao
author_sort Huson, Daniel H.
collection PubMed
description Summary: In the context of metagenomics, we introduce a new approach to protein database search called PAUDA, which runs ∼10 000 times faster than BLASTX, while achieving about one-third of the assignment rate of reads to KEGG orthology groups, and producing gene and taxon abundance profiles that are highly correlated to those obtained with BLASTX. PAUDA requires <80 CPU hours to analyze a dataset of 246 million Illumina DNA reads from permafrost soil for which a previous BLASTX analysis (on a subset of 176 million reads) reportedly required 800 000 CPU hours, leading to the same clustering of samples by functional profiles. Availability: PAUDA is freely available from: http://ab.inf.uni-tuebingen.de/software/pauda. Also supplementary method details are available from this website. Contact: daniel.huson@uni-tuebingen.de or xiechao@bic.nus.edu.sg
format Online
Article
Text
id pubmed-3866550
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-38665502013-12-18 A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA Huson, Daniel H. Xie, Chao Bioinformatics Hitseq Papers Summary: In the context of metagenomics, we introduce a new approach to protein database search called PAUDA, which runs ∼10 000 times faster than BLASTX, while achieving about one-third of the assignment rate of reads to KEGG orthology groups, and producing gene and taxon abundance profiles that are highly correlated to those obtained with BLASTX. PAUDA requires <80 CPU hours to analyze a dataset of 246 million Illumina DNA reads from permafrost soil for which a previous BLASTX analysis (on a subset of 176 million reads) reportedly required 800 000 CPU hours, leading to the same clustering of samples by functional profiles. Availability: PAUDA is freely available from: http://ab.inf.uni-tuebingen.de/software/pauda. Also supplementary method details are available from this website. Contact: daniel.huson@uni-tuebingen.de or xiechao@bic.nus.edu.sg Oxford University Press 2014-01-01 2013-05-07 /pmc/articles/PMC3866550/ /pubmed/23658416 http://dx.doi.org/10.1093/bioinformatics/btt254 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Hitseq Papers
Huson, Daniel H.
Xie, Chao
A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA
title A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA
title_full A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA
title_fullStr A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA
title_full_unstemmed A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA
title_short A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA
title_sort poor man’s blastx—high-throughput metagenomic protein database search using pauda
topic Hitseq Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866550/
https://www.ncbi.nlm.nih.gov/pubmed/23658416
http://dx.doi.org/10.1093/bioinformatics/btt254
work_keys_str_mv AT husondanielh apoormansblastxhighthroughputmetagenomicproteindatabasesearchusingpauda
AT xiechao apoormansblastxhighthroughputmetagenomicproteindatabasesearchusingpauda
AT husondanielh poormansblastxhighthroughputmetagenomicproteindatabasesearchusingpauda
AT xiechao poormansblastxhighthroughputmetagenomicproteindatabasesearchusingpauda