Cargando…

Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads

BACKGROUND: Microbiome sequencing projects typically collect tens of millions of short reads per sample. Depending on the goals of the project, the short reads can either be subjected to direct sequence analysis or be assembled into longer contigs. The assembly of whole genomes from metagenomic sequ...

Descripción completa

Detalles Bibliográficos
Autores principales: Huson, Daniel H., Tappu, Rewati, Bazinet, Adam L, Xie, Chao, Cummings, Michael P., Nieselt, Kay, Williams, Rohan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5267372/
https://www.ncbi.nlm.nih.gov/pubmed/28122610
http://dx.doi.org/10.1186/s40168-017-0233-2
_version_ 1782500625311858688
author Huson, Daniel H.
Tappu, Rewati
Bazinet, Adam L
Xie, Chao
Cummings, Michael P.
Nieselt, Kay
Williams, Rohan
author_facet Huson, Daniel H.
Tappu, Rewati
Bazinet, Adam L
Xie, Chao
Cummings, Michael P.
Nieselt, Kay
Williams, Rohan
author_sort Huson, Daniel H.
collection PubMed
description BACKGROUND: Microbiome sequencing projects typically collect tens of millions of short reads per sample. Depending on the goals of the project, the short reads can either be subjected to direct sequence analysis or be assembled into longer contigs. The assembly of whole genomes from metagenomic sequencing reads is a very difficult problem. However, for some questions, only specific genes of interest need to be assembled. This is then a gene-centric assembly where the goal is to assemble reads into contigs for a family of orthologous genes. METHODS: We present a new method for performing gene-centric assembly, called protein-alignment-guided assembly, and provide an implementation in our metagenome analysis tool MEGAN. Genes are assembled on the fly, based on the alignment of all reads against a protein reference database such as NCBI-nr. Specifically, the user selects a gene family based on a classification such as KEGG and all reads binned to that gene family are assembled. RESULTS: Using published synthetic community metagenome sequencing reads and a set of 41 gene families, we show that the performance of this approach compares favorably with that of full-featured assemblers and that of a recently published HMM-based gene-centric assembler, both in terms of the number of reference genes detected and of the percentage of reference sequence covered. CONCLUSIONS: Protein-alignment-guided assembly of orthologous gene families complements whole-metagenome assembly in a new and very useful way.
format Online
Article
Text
id pubmed-5267372
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-52673722017-02-01 Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads Huson, Daniel H. Tappu, Rewati Bazinet, Adam L Xie, Chao Cummings, Michael P. Nieselt, Kay Williams, Rohan Microbiome Methodology BACKGROUND: Microbiome sequencing projects typically collect tens of millions of short reads per sample. Depending on the goals of the project, the short reads can either be subjected to direct sequence analysis or be assembled into longer contigs. The assembly of whole genomes from metagenomic sequencing reads is a very difficult problem. However, for some questions, only specific genes of interest need to be assembled. This is then a gene-centric assembly where the goal is to assemble reads into contigs for a family of orthologous genes. METHODS: We present a new method for performing gene-centric assembly, called protein-alignment-guided assembly, and provide an implementation in our metagenome analysis tool MEGAN. Genes are assembled on the fly, based on the alignment of all reads against a protein reference database such as NCBI-nr. Specifically, the user selects a gene family based on a classification such as KEGG and all reads binned to that gene family are assembled. RESULTS: Using published synthetic community metagenome sequencing reads and a set of 41 gene families, we show that the performance of this approach compares favorably with that of full-featured assemblers and that of a recently published HMM-based gene-centric assembler, both in terms of the number of reference genes detected and of the percentage of reference sequence covered. CONCLUSIONS: Protein-alignment-guided assembly of orthologous gene families complements whole-metagenome assembly in a new and very useful way. BioMed Central 2017-01-25 /pmc/articles/PMC5267372/ /pubmed/28122610 http://dx.doi.org/10.1186/s40168-017-0233-2 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Huson, Daniel H.
Tappu, Rewati
Bazinet, Adam L
Xie, Chao
Cummings, Michael P.
Nieselt, Kay
Williams, Rohan
Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads
title Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads
title_full Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads
title_fullStr Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads
title_full_unstemmed Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads
title_short Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads
title_sort fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5267372/
https://www.ncbi.nlm.nih.gov/pubmed/28122610
http://dx.doi.org/10.1186/s40168-017-0233-2
work_keys_str_mv AT husondanielh fastandsimpleproteinalignmentguidedassemblyoforthologousgenefamiliesfrommicrobiomesequencingreads
AT tappurewati fastandsimpleproteinalignmentguidedassemblyoforthologousgenefamiliesfrommicrobiomesequencingreads
AT bazinetadaml fastandsimpleproteinalignmentguidedassemblyoforthologousgenefamiliesfrommicrobiomesequencingreads
AT xiechao fastandsimpleproteinalignmentguidedassemblyoforthologousgenefamiliesfrommicrobiomesequencingreads
AT cummingsmichaelp fastandsimpleproteinalignmentguidedassemblyoforthologousgenefamiliesfrommicrobiomesequencingreads
AT nieseltkay fastandsimpleproteinalignmentguidedassemblyoforthologousgenefamiliesfrommicrobiomesequencingreads
AT williamsrohan fastandsimpleproteinalignmentguidedassemblyoforthologousgenefamiliesfrommicrobiomesequencingreads