Cargando…

GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly

BACKGROUND: Metagenomics is a cultivation-independent approach that enables the study of the genomic composition of microbes present in an environment. Metagenomic samples are routinely sequenced using next-generation sequencing technologies that generate short nucleotide reads. Proteins identified...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhong, Cuncong, Yang, Youngik, Yooseph, Shibu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009819/
https://www.ncbi.nlm.nih.gov/pubmed/27585568
http://dx.doi.org/10.1186/s12859-016-1119-1
_version_ 1782451582432968704
author Zhong, Cuncong
Yang, Youngik
Yooseph, Shibu
author_facet Zhong, Cuncong
Yang, Youngik
Yooseph, Shibu
author_sort Zhong, Cuncong
collection PubMed
description BACKGROUND: Metagenomics is a cultivation-independent approach that enables the study of the genomic composition of microbes present in an environment. Metagenomic samples are routinely sequenced using next-generation sequencing technologies that generate short nucleotide reads. Proteins identified from these reads are mostly of partial length. On the other hand, de novo assembly of a large metagenomic dataset is computationally demanding and the assembled contigs are often fragmented, resulting in the identification of protein sequences that are also of partial length and incomplete. Annotation of an incomplete protein sequence often proceeds by identifying its homologs in a database of reference sequences. Identifying the homologs of incomplete sequences is a challenge and can result in substandard annotation of proteins from metagenomic datasets. To address this problem, we recently developed a homology detection algorithm named GRASP (Guided Reference-based Assembly of Short Peptides) that identifies the homologs of a given reference protein sequence in a database of short peptide metagenomic sequences. GRASP was developed to implement a simultaneous alignment and assembly algorithm for annotation of short peptides identified on metagenomic reads. The program achieves significantly improved recall rate at the cost of computational efficiency. In this article, we adopted three techniques to speed up the original version of GRASP, including the pre-construction of extension links, local assembly of individual seeds, and the implementation of query-level parallelism. RESULTS: The resulting new program, GRASPx, achieves >30X speedup compared to its predecessor GRASP. At the same time, we show that the performance of GRASPx is consistent with that of GRASP, and that both of them significantly outperform other popular homology-search tools including the BLAST and FASTA suites. GRASPx was also applied to a human saliva metagenome dataset and shows superior performance for both recall and precision rates. CONCLUSIONS: In this article we present GRASPx, a fast and accurate homology-search program implementing a simultaneous alignment and assembly framework. GRASPx can be used for more comprehensive and accurate annotation of short peptides. GRASPx is freely available at http://graspx.sourceforge.net/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1119-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5009819
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50098192016-09-09 GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly Zhong, Cuncong Yang, Youngik Yooseph, Shibu BMC Bioinformatics Research BACKGROUND: Metagenomics is a cultivation-independent approach that enables the study of the genomic composition of microbes present in an environment. Metagenomic samples are routinely sequenced using next-generation sequencing technologies that generate short nucleotide reads. Proteins identified from these reads are mostly of partial length. On the other hand, de novo assembly of a large metagenomic dataset is computationally demanding and the assembled contigs are often fragmented, resulting in the identification of protein sequences that are also of partial length and incomplete. Annotation of an incomplete protein sequence often proceeds by identifying its homologs in a database of reference sequences. Identifying the homologs of incomplete sequences is a challenge and can result in substandard annotation of proteins from metagenomic datasets. To address this problem, we recently developed a homology detection algorithm named GRASP (Guided Reference-based Assembly of Short Peptides) that identifies the homologs of a given reference protein sequence in a database of short peptide metagenomic sequences. GRASP was developed to implement a simultaneous alignment and assembly algorithm for annotation of short peptides identified on metagenomic reads. The program achieves significantly improved recall rate at the cost of computational efficiency. In this article, we adopted three techniques to speed up the original version of GRASP, including the pre-construction of extension links, local assembly of individual seeds, and the implementation of query-level parallelism. RESULTS: The resulting new program, GRASPx, achieves >30X speedup compared to its predecessor GRASP. At the same time, we show that the performance of GRASPx is consistent with that of GRASP, and that both of them significantly outperform other popular homology-search tools including the BLAST and FASTA suites. GRASPx was also applied to a human saliva metagenome dataset and shows superior performance for both recall and precision rates. CONCLUSIONS: In this article we present GRASPx, a fast and accurate homology-search program implementing a simultaneous alignment and assembly framework. GRASPx can be used for more comprehensive and accurate annotation of short peptides. GRASPx is freely available at http://graspx.sourceforge.net/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1119-1) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-31 /pmc/articles/PMC5009819/ /pubmed/27585568 http://dx.doi.org/10.1186/s12859-016-1119-1 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Zhong, Cuncong
Yang, Youngik
Yooseph, Shibu
GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly
title GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly
title_full GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly
title_fullStr GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly
title_full_unstemmed GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly
title_short GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly
title_sort graspx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009819/
https://www.ncbi.nlm.nih.gov/pubmed/27585568
http://dx.doi.org/10.1186/s12859-016-1119-1
work_keys_str_mv AT zhongcuncong graspxefficienthomologsearchofshortpeptidemetagenomedatabasethroughsimultaneousalignmentandassembly
AT yangyoungik graspxefficienthomologsearchofshortpeptidemetagenomedatabasethroughsimultaneousalignmentandassembly
AT yoosephshibu graspxefficienthomologsearchofshortpeptidemetagenomedatabasethroughsimultaneousalignmentandassembly