Cargando…
GRASP: Guided Reference-based Assembly of Short Peptides
Protein sequences predicted from metagenomic datasets are annotated by identifying their homologs via sequence comparisons with reference or curated proteins. However, a majority of metagenomic protein sequences are partial-length, arising as a result of identifying genes on sequencing reads or on a...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4330339/ https://www.ncbi.nlm.nih.gov/pubmed/25414351 http://dx.doi.org/10.1093/nar/gku1210 |
_version_ | 1782357567320621056 |
---|---|
author | Zhong, Cuncong Yang, Youngik Yooseph, Shibu |
author_facet | Zhong, Cuncong Yang, Youngik Yooseph, Shibu |
author_sort | Zhong, Cuncong |
collection | PubMed |
description | Protein sequences predicted from metagenomic datasets are annotated by identifying their homologs via sequence comparisons with reference or curated proteins. However, a majority of metagenomic protein sequences are partial-length, arising as a result of identifying genes on sequencing reads or on assembled nucleotide contigs, which themselves are often very fragmented. The fragmented nature of metagenomic protein predictions adversely impacts homology detection and, therefore, the quality of the overall annotation of the dataset. Here we present a novel algorithm called GRASP that accurately identifies the homologs of a given reference protein sequence from a database consisting of partial-length metagenomic proteins. Our homology detection strategy is guided by the reference sequence, and involves the simultaneous search and assembly of overlapping database sequences. GRASP was compared to three commonly used protein sequence search programs (BLASTP, PSI-BLAST and FASTM). Our evaluations using several simulated and real datasets show that GRASP has a significantly higher sensitivity than these programs while maintaining a very high specificity. GRASP can be a very useful program for detecting and quantifying taxonomic and protein family abundances in metagenomic datasets. GRASP is implemented in GNU C++, and is freely available at http://sourceforge.net/projects/grasp-release. |
format | Online Article Text |
id | pubmed-4330339 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-43303392015-03-18 GRASP: Guided Reference-based Assembly of Short Peptides Zhong, Cuncong Yang, Youngik Yooseph, Shibu Nucleic Acids Res Methods Online Protein sequences predicted from metagenomic datasets are annotated by identifying their homologs via sequence comparisons with reference or curated proteins. However, a majority of metagenomic protein sequences are partial-length, arising as a result of identifying genes on sequencing reads or on assembled nucleotide contigs, which themselves are often very fragmented. The fragmented nature of metagenomic protein predictions adversely impacts homology detection and, therefore, the quality of the overall annotation of the dataset. Here we present a novel algorithm called GRASP that accurately identifies the homologs of a given reference protein sequence from a database consisting of partial-length metagenomic proteins. Our homology detection strategy is guided by the reference sequence, and involves the simultaneous search and assembly of overlapping database sequences. GRASP was compared to three commonly used protein sequence search programs (BLASTP, PSI-BLAST and FASTM). Our evaluations using several simulated and real datasets show that GRASP has a significantly higher sensitivity than these programs while maintaining a very high specificity. GRASP can be a very useful program for detecting and quantifying taxonomic and protein family abundances in metagenomic datasets. GRASP is implemented in GNU C++, and is freely available at http://sourceforge.net/projects/grasp-release. Oxford University Press 2015-02-18 2014-11-20 /pmc/articles/PMC4330339/ /pubmed/25414351 http://dx.doi.org/10.1093/nar/gku1210 Text en © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Zhong, Cuncong Yang, Youngik Yooseph, Shibu GRASP: Guided Reference-based Assembly of Short Peptides |
title | GRASP: Guided Reference-based Assembly of Short Peptides |
title_full | GRASP: Guided Reference-based Assembly of Short Peptides |
title_fullStr | GRASP: Guided Reference-based Assembly of Short Peptides |
title_full_unstemmed | GRASP: Guided Reference-based Assembly of Short Peptides |
title_short | GRASP: Guided Reference-based Assembly of Short Peptides |
title_sort | grasp: guided reference-based assembly of short peptides |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4330339/ https://www.ncbi.nlm.nih.gov/pubmed/25414351 http://dx.doi.org/10.1093/nar/gku1210 |
work_keys_str_mv | AT zhongcuncong graspguidedreferencebasedassemblyofshortpeptides AT yangyoungik graspguidedreferencebasedassemblyofshortpeptides AT yoosephshibu graspguidedreferencebasedassemblyofshortpeptides |