Cargando…

GRASP: Guided Reference-based Assembly of Short Peptides

Protein sequences predicted from metagenomic datasets are annotated by identifying their homologs via sequence comparisons with reference or curated proteins. However, a majority of metagenomic protein sequences are partial-length, arising as a result of identifying genes on sequencing reads or on a...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhong, Cuncong, Yang, Youngik, Yooseph, Shibu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4330339/
https://www.ncbi.nlm.nih.gov/pubmed/25414351
http://dx.doi.org/10.1093/nar/gku1210
_version_ 1782357567320621056
author Zhong, Cuncong
Yang, Youngik
Yooseph, Shibu
author_facet Zhong, Cuncong
Yang, Youngik
Yooseph, Shibu
author_sort Zhong, Cuncong
collection PubMed
description Protein sequences predicted from metagenomic datasets are annotated by identifying their homologs via sequence comparisons with reference or curated proteins. However, a majority of metagenomic protein sequences are partial-length, arising as a result of identifying genes on sequencing reads or on assembled nucleotide contigs, which themselves are often very fragmented. The fragmented nature of metagenomic protein predictions adversely impacts homology detection and, therefore, the quality of the overall annotation of the dataset. Here we present a novel algorithm called GRASP that accurately identifies the homologs of a given reference protein sequence from a database consisting of partial-length metagenomic proteins. Our homology detection strategy is guided by the reference sequence, and involves the simultaneous search and assembly of overlapping database sequences. GRASP was compared to three commonly used protein sequence search programs (BLASTP, PSI-BLAST and FASTM). Our evaluations using several simulated and real datasets show that GRASP has a significantly higher sensitivity than these programs while maintaining a very high specificity. GRASP can be a very useful program for detecting and quantifying taxonomic and protein family abundances in metagenomic datasets. GRASP is implemented in GNU C++, and is freely available at http://sourceforge.net/projects/grasp-release.
format Online
Article
Text
id pubmed-4330339
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-43303392015-03-18 GRASP: Guided Reference-based Assembly of Short Peptides Zhong, Cuncong Yang, Youngik Yooseph, Shibu Nucleic Acids Res Methods Online Protein sequences predicted from metagenomic datasets are annotated by identifying their homologs via sequence comparisons with reference or curated proteins. However, a majority of metagenomic protein sequences are partial-length, arising as a result of identifying genes on sequencing reads or on assembled nucleotide contigs, which themselves are often very fragmented. The fragmented nature of metagenomic protein predictions adversely impacts homology detection and, therefore, the quality of the overall annotation of the dataset. Here we present a novel algorithm called GRASP that accurately identifies the homologs of a given reference protein sequence from a database consisting of partial-length metagenomic proteins. Our homology detection strategy is guided by the reference sequence, and involves the simultaneous search and assembly of overlapping database sequences. GRASP was compared to three commonly used protein sequence search programs (BLASTP, PSI-BLAST and FASTM). Our evaluations using several simulated and real datasets show that GRASP has a significantly higher sensitivity than these programs while maintaining a very high specificity. GRASP can be a very useful program for detecting and quantifying taxonomic and protein family abundances in metagenomic datasets. GRASP is implemented in GNU C++, and is freely available at http://sourceforge.net/projects/grasp-release. Oxford University Press 2015-02-18 2014-11-20 /pmc/articles/PMC4330339/ /pubmed/25414351 http://dx.doi.org/10.1093/nar/gku1210 Text en © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Zhong, Cuncong
Yang, Youngik
Yooseph, Shibu
GRASP: Guided Reference-based Assembly of Short Peptides
title GRASP: Guided Reference-based Assembly of Short Peptides
title_full GRASP: Guided Reference-based Assembly of Short Peptides
title_fullStr GRASP: Guided Reference-based Assembly of Short Peptides
title_full_unstemmed GRASP: Guided Reference-based Assembly of Short Peptides
title_short GRASP: Guided Reference-based Assembly of Short Peptides
title_sort grasp: guided reference-based assembly of short peptides
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4330339/
https://www.ncbi.nlm.nih.gov/pubmed/25414351
http://dx.doi.org/10.1093/nar/gku1210
work_keys_str_mv AT zhongcuncong graspguidedreferencebasedassemblyofshortpeptides
AT yangyoungik graspguidedreferencebasedassemblyofshortpeptides
AT yoosephshibu graspguidedreferencebasedassemblyofshortpeptides