Cargando…
GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly
BACKGROUND: Metagenomics is a cultivation-independent approach that enables the study of the genomic composition of microbes present in an environment. Metagenomic samples are routinely sequenced using next-generation sequencing technologies that generate short nucleotide reads. Proteins identified...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009819/ https://www.ncbi.nlm.nih.gov/pubmed/27585568 http://dx.doi.org/10.1186/s12859-016-1119-1 |
_version_ | 1782451582432968704 |
---|---|
author | Zhong, Cuncong Yang, Youngik Yooseph, Shibu |
author_facet | Zhong, Cuncong Yang, Youngik Yooseph, Shibu |
author_sort | Zhong, Cuncong |
collection | PubMed |
description | BACKGROUND: Metagenomics is a cultivation-independent approach that enables the study of the genomic composition of microbes present in an environment. Metagenomic samples are routinely sequenced using next-generation sequencing technologies that generate short nucleotide reads. Proteins identified from these reads are mostly of partial length. On the other hand, de novo assembly of a large metagenomic dataset is computationally demanding and the assembled contigs are often fragmented, resulting in the identification of protein sequences that are also of partial length and incomplete. Annotation of an incomplete protein sequence often proceeds by identifying its homologs in a database of reference sequences. Identifying the homologs of incomplete sequences is a challenge and can result in substandard annotation of proteins from metagenomic datasets. To address this problem, we recently developed a homology detection algorithm named GRASP (Guided Reference-based Assembly of Short Peptides) that identifies the homologs of a given reference protein sequence in a database of short peptide metagenomic sequences. GRASP was developed to implement a simultaneous alignment and assembly algorithm for annotation of short peptides identified on metagenomic reads. The program achieves significantly improved recall rate at the cost of computational efficiency. In this article, we adopted three techniques to speed up the original version of GRASP, including the pre-construction of extension links, local assembly of individual seeds, and the implementation of query-level parallelism. RESULTS: The resulting new program, GRASPx, achieves >30X speedup compared to its predecessor GRASP. At the same time, we show that the performance of GRASPx is consistent with that of GRASP, and that both of them significantly outperform other popular homology-search tools including the BLAST and FASTA suites. GRASPx was also applied to a human saliva metagenome dataset and shows superior performance for both recall and precision rates. CONCLUSIONS: In this article we present GRASPx, a fast and accurate homology-search program implementing a simultaneous alignment and assembly framework. GRASPx can be used for more comprehensive and accurate annotation of short peptides. GRASPx is freely available at http://graspx.sourceforge.net/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1119-1) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5009819 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-50098192016-09-09 GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly Zhong, Cuncong Yang, Youngik Yooseph, Shibu BMC Bioinformatics Research BACKGROUND: Metagenomics is a cultivation-independent approach that enables the study of the genomic composition of microbes present in an environment. Metagenomic samples are routinely sequenced using next-generation sequencing technologies that generate short nucleotide reads. Proteins identified from these reads are mostly of partial length. On the other hand, de novo assembly of a large metagenomic dataset is computationally demanding and the assembled contigs are often fragmented, resulting in the identification of protein sequences that are also of partial length and incomplete. Annotation of an incomplete protein sequence often proceeds by identifying its homologs in a database of reference sequences. Identifying the homologs of incomplete sequences is a challenge and can result in substandard annotation of proteins from metagenomic datasets. To address this problem, we recently developed a homology detection algorithm named GRASP (Guided Reference-based Assembly of Short Peptides) that identifies the homologs of a given reference protein sequence in a database of short peptide metagenomic sequences. GRASP was developed to implement a simultaneous alignment and assembly algorithm for annotation of short peptides identified on metagenomic reads. The program achieves significantly improved recall rate at the cost of computational efficiency. In this article, we adopted three techniques to speed up the original version of GRASP, including the pre-construction of extension links, local assembly of individual seeds, and the implementation of query-level parallelism. RESULTS: The resulting new program, GRASPx, achieves >30X speedup compared to its predecessor GRASP. At the same time, we show that the performance of GRASPx is consistent with that of GRASP, and that both of them significantly outperform other popular homology-search tools including the BLAST and FASTA suites. GRASPx was also applied to a human saliva metagenome dataset and shows superior performance for both recall and precision rates. CONCLUSIONS: In this article we present GRASPx, a fast and accurate homology-search program implementing a simultaneous alignment and assembly framework. GRASPx can be used for more comprehensive and accurate annotation of short peptides. GRASPx is freely available at http://graspx.sourceforge.net/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1119-1) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-31 /pmc/articles/PMC5009819/ /pubmed/27585568 http://dx.doi.org/10.1186/s12859-016-1119-1 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Zhong, Cuncong Yang, Youngik Yooseph, Shibu GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly |
title | GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly |
title_full | GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly |
title_fullStr | GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly |
title_full_unstemmed | GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly |
title_short | GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly |
title_sort | graspx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009819/ https://www.ncbi.nlm.nih.gov/pubmed/27585568 http://dx.doi.org/10.1186/s12859-016-1119-1 |
work_keys_str_mv | AT zhongcuncong graspxefficienthomologsearchofshortpeptidemetagenomedatabasethroughsimultaneousalignmentandassembly AT yangyoungik graspxefficienthomologsearchofshortpeptidemetagenomedatabasethroughsimultaneousalignmentandassembly AT yoosephshibu graspxefficienthomologsearchofshortpeptidemetagenomedatabasethroughsimultaneousalignmentandassembly |