Cargando…
RAPSearch: a fast protein similarity search tool for short reads
BACKGROUND: Next Generation Sequencing (NGS) is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--fac...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3113943/ https://www.ncbi.nlm.nih.gov/pubmed/21575167 http://dx.doi.org/10.1186/1471-2105-12-159 |
_version_ | 1782206009179111424 |
---|---|
author | Ye, Yuzhen Choi, Jeong-Hyeon Tang, Haixu |
author_facet | Ye, Yuzhen Choi, Jeong-Hyeon Tang, Haixu |
author_sort | Ye, Yuzhen |
collection | PubMed |
description | BACKGROUND: Next Generation Sequencing (NGS) is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--faces daunting challenges because of the very sizes of the short read datasets. RESULTS: We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames) we tested, RAPSearch achieved ~20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (~1.3-3.2%) of BLASTX similarity hits, but it also discovered additional homologous proteins (~0.3-2.1%) that BLASTX missed. By contrast, BLAT, a tool that is even slightly faster than RAPSearch, had significant loss of sensitivity as compared to RAPSearch and BLAST. CONCLUSIONS: RAPSearch is implemented as open-source software and is accessible at http://omics.informatics.indiana.edu/mg/RAPSearch. It enables faster protein similarity search. The application of RAPSearch in metageomics has also been demonstrated. |
format | Online Article Text |
id | pubmed-3113943 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-31139432011-06-14 RAPSearch: a fast protein similarity search tool for short reads Ye, Yuzhen Choi, Jeong-Hyeon Tang, Haixu BMC Bioinformatics Software BACKGROUND: Next Generation Sequencing (NGS) is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--faces daunting challenges because of the very sizes of the short read datasets. RESULTS: We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames) we tested, RAPSearch achieved ~20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (~1.3-3.2%) of BLASTX similarity hits, but it also discovered additional homologous proteins (~0.3-2.1%) that BLASTX missed. By contrast, BLAT, a tool that is even slightly faster than RAPSearch, had significant loss of sensitivity as compared to RAPSearch and BLAST. CONCLUSIONS: RAPSearch is implemented as open-source software and is accessible at http://omics.informatics.indiana.edu/mg/RAPSearch. It enables faster protein similarity search. The application of RAPSearch in metageomics has also been demonstrated. BioMed Central 2011-05-15 /pmc/articles/PMC3113943/ /pubmed/21575167 http://dx.doi.org/10.1186/1471-2105-12-159 Text en Copyright © 2011 Ye et al; licensee BioMed Central Ltd. https://creativecommons.org/licenses/by/2.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0 (https://creativecommons.org/licenses/by/2.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Ye, Yuzhen Choi, Jeong-Hyeon Tang, Haixu RAPSearch: a fast protein similarity search tool for short reads |
title | RAPSearch: a fast protein similarity search tool for short reads |
title_full | RAPSearch: a fast protein similarity search tool for short reads |
title_fullStr | RAPSearch: a fast protein similarity search tool for short reads |
title_full_unstemmed | RAPSearch: a fast protein similarity search tool for short reads |
title_short | RAPSearch: a fast protein similarity search tool for short reads |
title_sort | rapsearch: a fast protein similarity search tool for short reads |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3113943/ https://www.ncbi.nlm.nih.gov/pubmed/21575167 http://dx.doi.org/10.1186/1471-2105-12-159 |
work_keys_str_mv | AT yeyuzhen rapsearchafastproteinsimilaritysearchtoolforshortreads AT choijeonghyeon rapsearchafastproteinsimilaritysearchtoolforshortreads AT tanghaixu rapsearchafastproteinsimilaritysearchtoolforshortreads |