Cargando…

Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics

Motivation: One of the difficulties in metagenomic assembly is that homologous genes from evolutionarily closely related species may behave like repeats and confuse assemblers. As a result, small contigs, each representing a short gene fragment, instead of complete genes, may be reported by an assem...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Yu-Wei, Rho, Mina, Doak, Thomas G., Ye, Yuzhen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436815/
https://www.ncbi.nlm.nih.gov/pubmed/22962453
http://dx.doi.org/10.1093/bioinformatics/bts388
_version_ 1782242704117202944
author Wu, Yu-Wei
Rho, Mina
Doak, Thomas G.
Ye, Yuzhen
author_facet Wu, Yu-Wei
Rho, Mina
Doak, Thomas G.
Ye, Yuzhen
author_sort Wu, Yu-Wei
collection PubMed
description Motivation: One of the difficulties in metagenomic assembly is that homologous genes from evolutionarily closely related species may behave like repeats and confuse assemblers. As a result, small contigs, each representing a short gene fragment, instead of complete genes, may be reported by an assembler. This further complicates annotation of metagenomic datasets, as annotation tools (such as gene predictors or similarity search tools) typically perform poorly on configs encoding short gene fragments. Results: We present a novel way of using the de Bruijn graph assembly of metagenomes to improve the assembly of genes. A network matching algorithm is proposed for matching the de Bruijn graph of contigs against reference genes, to derive ‘gene paths’ in the graph (sequences of contigs containing gene fragments) that have the highest similarities to known genes, allowing gene fragments contained in multiple contigs to be connected to form more complete (or intact) genes. Tests on simulated and real datasets show that our approach (called GeneStitch) is able to significantly improve the assembly of genes from metagenomic sequences, by connecting contigs with the guidance of homologous genes—information that is orthogonal to the sequencing reads. We note that the improvement of gene assembly can be observed even when only distantly related genes are available as the reference. We further propose to use ‘gene graphs’ to represent the assembly of reads from homologous genes and discuss potential applications of gene graphs to improving functional annotation for metagenomics. Availability: The tools are available as open source for download at http://omics.informatics.indiana.edu/GeneStitch Contact: yye@indiana.edu
format Online
Article
Text
id pubmed-3436815
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-34368152012-12-12 Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics Wu, Yu-Wei Rho, Mina Doak, Thomas G. Ye, Yuzhen Bioinformatics Original Papers Motivation: One of the difficulties in metagenomic assembly is that homologous genes from evolutionarily closely related species may behave like repeats and confuse assemblers. As a result, small contigs, each representing a short gene fragment, instead of complete genes, may be reported by an assembler. This further complicates annotation of metagenomic datasets, as annotation tools (such as gene predictors or similarity search tools) typically perform poorly on configs encoding short gene fragments. Results: We present a novel way of using the de Bruijn graph assembly of metagenomes to improve the assembly of genes. A network matching algorithm is proposed for matching the de Bruijn graph of contigs against reference genes, to derive ‘gene paths’ in the graph (sequences of contigs containing gene fragments) that have the highest similarities to known genes, allowing gene fragments contained in multiple contigs to be connected to form more complete (or intact) genes. Tests on simulated and real datasets show that our approach (called GeneStitch) is able to significantly improve the assembly of genes from metagenomic sequences, by connecting contigs with the guidance of homologous genes—information that is orthogonal to the sequencing reads. We note that the improvement of gene assembly can be observed even when only distantly related genes are available as the reference. We further propose to use ‘gene graphs’ to represent the assembly of reads from homologous genes and discuss potential applications of gene graphs to improving functional annotation for metagenomics. Availability: The tools are available as open source for download at http://omics.informatics.indiana.edu/GeneStitch Contact: yye@indiana.edu Oxford University Press 2012-09-15 2012-09-03 /pmc/articles/PMC3436815/ /pubmed/22962453 http://dx.doi.org/10.1093/bioinformatics/bts388 Text en © The Author(s) (2012). Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Wu, Yu-Wei
Rho, Mina
Doak, Thomas G.
Ye, Yuzhen
Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics
title Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics
title_full Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics
title_fullStr Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics
title_full_unstemmed Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics
title_short Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics
title_sort stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436815/
https://www.ncbi.nlm.nih.gov/pubmed/22962453
http://dx.doi.org/10.1093/bioinformatics/bts388
work_keys_str_mv AT wuyuwei stitchinggenefragmentswithanetworkmatchingalgorithmimprovesgeneassemblyformetagenomics
AT rhomina stitchinggenefragmentswithanetworkmatchingalgorithmimprovesgeneassemblyformetagenomics
AT doakthomasg stitchinggenefragmentswithanetworkmatchingalgorithmimprovesgeneassemblyformetagenomics
AT yeyuzhen stitchinggenefragmentswithanetworkmatchingalgorithmimprovesgeneassemblyformetagenomics