Cargando…
Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads
BACKGROUND: Microbiome sequencing projects typically collect tens of millions of short reads per sample. Depending on the goals of the project, the short reads can either be subjected to direct sequence analysis or be assembled into longer contigs. The assembly of whole genomes from metagenomic sequ...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5267372/ https://www.ncbi.nlm.nih.gov/pubmed/28122610 http://dx.doi.org/10.1186/s40168-017-0233-2 |
_version_ | 1782500625311858688 |
---|---|
author | Huson, Daniel H. Tappu, Rewati Bazinet, Adam L Xie, Chao Cummings, Michael P. Nieselt, Kay Williams, Rohan |
author_facet | Huson, Daniel H. Tappu, Rewati Bazinet, Adam L Xie, Chao Cummings, Michael P. Nieselt, Kay Williams, Rohan |
author_sort | Huson, Daniel H. |
collection | PubMed |
description | BACKGROUND: Microbiome sequencing projects typically collect tens of millions of short reads per sample. Depending on the goals of the project, the short reads can either be subjected to direct sequence analysis or be assembled into longer contigs. The assembly of whole genomes from metagenomic sequencing reads is a very difficult problem. However, for some questions, only specific genes of interest need to be assembled. This is then a gene-centric assembly where the goal is to assemble reads into contigs for a family of orthologous genes. METHODS: We present a new method for performing gene-centric assembly, called protein-alignment-guided assembly, and provide an implementation in our metagenome analysis tool MEGAN. Genes are assembled on the fly, based on the alignment of all reads against a protein reference database such as NCBI-nr. Specifically, the user selects a gene family based on a classification such as KEGG and all reads binned to that gene family are assembled. RESULTS: Using published synthetic community metagenome sequencing reads and a set of 41 gene families, we show that the performance of this approach compares favorably with that of full-featured assemblers and that of a recently published HMM-based gene-centric assembler, both in terms of the number of reference genes detected and of the percentage of reference sequence covered. CONCLUSIONS: Protein-alignment-guided assembly of orthologous gene families complements whole-metagenome assembly in a new and very useful way. |
format | Online Article Text |
id | pubmed-5267372 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-52673722017-02-01 Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads Huson, Daniel H. Tappu, Rewati Bazinet, Adam L Xie, Chao Cummings, Michael P. Nieselt, Kay Williams, Rohan Microbiome Methodology BACKGROUND: Microbiome sequencing projects typically collect tens of millions of short reads per sample. Depending on the goals of the project, the short reads can either be subjected to direct sequence analysis or be assembled into longer contigs. The assembly of whole genomes from metagenomic sequencing reads is a very difficult problem. However, for some questions, only specific genes of interest need to be assembled. This is then a gene-centric assembly where the goal is to assemble reads into contigs for a family of orthologous genes. METHODS: We present a new method for performing gene-centric assembly, called protein-alignment-guided assembly, and provide an implementation in our metagenome analysis tool MEGAN. Genes are assembled on the fly, based on the alignment of all reads against a protein reference database such as NCBI-nr. Specifically, the user selects a gene family based on a classification such as KEGG and all reads binned to that gene family are assembled. RESULTS: Using published synthetic community metagenome sequencing reads and a set of 41 gene families, we show that the performance of this approach compares favorably with that of full-featured assemblers and that of a recently published HMM-based gene-centric assembler, both in terms of the number of reference genes detected and of the percentage of reference sequence covered. CONCLUSIONS: Protein-alignment-guided assembly of orthologous gene families complements whole-metagenome assembly in a new and very useful way. BioMed Central 2017-01-25 /pmc/articles/PMC5267372/ /pubmed/28122610 http://dx.doi.org/10.1186/s40168-017-0233-2 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Huson, Daniel H. Tappu, Rewati Bazinet, Adam L Xie, Chao Cummings, Michael P. Nieselt, Kay Williams, Rohan Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads |
title | Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads |
title_full | Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads |
title_fullStr | Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads |
title_full_unstemmed | Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads |
title_short | Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads |
title_sort | fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5267372/ https://www.ncbi.nlm.nih.gov/pubmed/28122610 http://dx.doi.org/10.1186/s40168-017-0233-2 |
work_keys_str_mv | AT husondanielh fastandsimpleproteinalignmentguidedassemblyoforthologousgenefamiliesfrommicrobiomesequencingreads AT tappurewati fastandsimpleproteinalignmentguidedassemblyoforthologousgenefamiliesfrommicrobiomesequencingreads AT bazinetadaml fastandsimpleproteinalignmentguidedassemblyoforthologousgenefamiliesfrommicrobiomesequencingreads AT xiechao fastandsimpleproteinalignmentguidedassemblyoforthologousgenefamiliesfrommicrobiomesequencingreads AT cummingsmichaelp fastandsimpleproteinalignmentguidedassemblyoforthologousgenefamiliesfrommicrobiomesequencingreads AT nieseltkay fastandsimpleproteinalignmentguidedassemblyoforthologousgenefamiliesfrommicrobiomesequencingreads AT williamsrohan fastandsimpleproteinalignmentguidedassemblyoforthologousgenefamiliesfrommicrobiomesequencingreads |