Cargando…

RAPSearch: a fast protein similarity search tool for short reads

BACKGROUND: Next Generation Sequencing (NGS) is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--fac...

Descripción completa

Detalles Bibliográficos
Autores principales: Ye, Yuzhen, Choi, Jeong-Hyeon, Tang, Haixu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3113943/
https://www.ncbi.nlm.nih.gov/pubmed/21575167
http://dx.doi.org/10.1186/1471-2105-12-159
_version_ 1782206009179111424
author Ye, Yuzhen
Choi, Jeong-Hyeon
Tang, Haixu
author_facet Ye, Yuzhen
Choi, Jeong-Hyeon
Tang, Haixu
author_sort Ye, Yuzhen
collection PubMed
description BACKGROUND: Next Generation Sequencing (NGS) is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--faces daunting challenges because of the very sizes of the short read datasets. RESULTS: We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames) we tested, RAPSearch achieved ~20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (~1.3-3.2%) of BLASTX similarity hits, but it also discovered additional homologous proteins (~0.3-2.1%) that BLASTX missed. By contrast, BLAT, a tool that is even slightly faster than RAPSearch, had significant loss of sensitivity as compared to RAPSearch and BLAST. CONCLUSIONS: RAPSearch is implemented as open-source software and is accessible at http://omics.informatics.indiana.edu/mg/RAPSearch. It enables faster protein similarity search. The application of RAPSearch in metageomics has also been demonstrated.
format Online
Article
Text
id pubmed-3113943
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31139432011-06-14 RAPSearch: a fast protein similarity search tool for short reads Ye, Yuzhen Choi, Jeong-Hyeon Tang, Haixu BMC Bioinformatics Software BACKGROUND: Next Generation Sequencing (NGS) is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--faces daunting challenges because of the very sizes of the short read datasets. RESULTS: We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames) we tested, RAPSearch achieved ~20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (~1.3-3.2%) of BLASTX similarity hits, but it also discovered additional homologous proteins (~0.3-2.1%) that BLASTX missed. By contrast, BLAT, a tool that is even slightly faster than RAPSearch, had significant loss of sensitivity as compared to RAPSearch and BLAST. CONCLUSIONS: RAPSearch is implemented as open-source software and is accessible at http://omics.informatics.indiana.edu/mg/RAPSearch. It enables faster protein similarity search. The application of RAPSearch in metageomics has also been demonstrated. BioMed Central 2011-05-15 /pmc/articles/PMC3113943/ /pubmed/21575167 http://dx.doi.org/10.1186/1471-2105-12-159 Text en Copyright © 2011 Ye et al; licensee BioMed Central Ltd. https://creativecommons.org/licenses/by/2.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0 (https://creativecommons.org/licenses/by/2.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Ye, Yuzhen
Choi, Jeong-Hyeon
Tang, Haixu
RAPSearch: a fast protein similarity search tool for short reads
title RAPSearch: a fast protein similarity search tool for short reads
title_full RAPSearch: a fast protein similarity search tool for short reads
title_fullStr RAPSearch: a fast protein similarity search tool for short reads
title_full_unstemmed RAPSearch: a fast protein similarity search tool for short reads
title_short RAPSearch: a fast protein similarity search tool for short reads
title_sort rapsearch: a fast protein similarity search tool for short reads
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3113943/
https://www.ncbi.nlm.nih.gov/pubmed/21575167
http://dx.doi.org/10.1186/1471-2105-12-159
work_keys_str_mv AT yeyuzhen rapsearchafastproteinsimilaritysearchtoolforshortreads
AT choijeonghyeon rapsearchafastproteinsimilaritysearchtoolforshortreads
AT tanghaixu rapsearchafastproteinsimilaritysearchtoolforshortreads