Cargando…

Amino acid based de Bruijn graph algorithm for identifying complete coding genes from metagenomic and metatranscriptomic short reads

Metagenomic studies, greatly promoted by the fast development of next-generation sequencing (NGS) technologies, uncover complex structures of microbial communities and their interactions with environment. As the majority of microbes lack information of genome sequences, it is essential to assemble p...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Jiemeng, Lian, Qichao, Chen, Yamao, Qi, Ji
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6412133/
https://www.ncbi.nlm.nih.gov/pubmed/30657979
http://dx.doi.org/10.1093/nar/gkz017
Descripción
Sumario:Metagenomic studies, greatly promoted by the fast development of next-generation sequencing (NGS) technologies, uncover complex structures of microbial communities and their interactions with environment. As the majority of microbes lack information of genome sequences, it is essential to assemble prokaryotic genomes ab initio aiming to retrieve complete coding genes from various metabolic pathways. The complex nature of microbial composition and the burden of handling a vast amount of metagenomic data, bring great challenges to the development of effective and efficient bioinformatic tools. Here we present a protein assembler (MetaPA), based on de Bruijn graph searching on oligopeptide spaces and can be applied on both metagenomic and metatranscriptomic sequencing data. When public homologous protein sequences are involved to guide the assembling procedures, MetaPA assembles 85% of total proteins in complete sequences with high precision of 83% on real high-throughput sequencing datasets. Application of MetaPA on metatranscriptomic data successfully identifies the majority of actively transcribed genes validated in related studies. The results suggest that MetaPA has a good potential in both metagenomic and metatranscriptomic studies to characterize the composition and abundance of microbiota.