Cargando…

Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data

Metagenomics is the study of all genomic content contained in given microbial communities. Metagenomic functional analysis aims to quantify protein families and reconstruct metabolic pathways from the metagenome. It plays a central role in understanding the interaction between the microbial communit...

Descripción completa

Detalles Bibliográficos
Autores principales: Thippabhotla, Sirisha, Liu, Ben, Podgorny, Adam, Yooseph, Shibu, Yang, Youngik, Zhang, Jun, Zhong, Cuncong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10006731/
https://www.ncbi.nlm.nih.gov/pubmed/36915411
http://dx.doi.org/10.1093/nargab/lqad023
_version_ 1784905364316094464
author Thippabhotla, Sirisha
Liu, Ben
Podgorny, Adam
Yooseph, Shibu
Yang, Youngik
Zhang, Jun
Zhong, Cuncong
author_facet Thippabhotla, Sirisha
Liu, Ben
Podgorny, Adam
Yooseph, Shibu
Yang, Youngik
Zhang, Jun
Zhong, Cuncong
author_sort Thippabhotla, Sirisha
collection PubMed
description Metagenomics is the study of all genomic content contained in given microbial communities. Metagenomic functional analysis aims to quantify protein families and reconstruct metabolic pathways from the metagenome. It plays a central role in understanding the interaction between the microbial community and its host or environment. De novo functional analysis, which allows the discovery of novel protein families, remains challenging for high-complexity communities. There are currently three main approaches for recovering novel genes or proteins: de novo nucleotide assembly, gene calling and peptide assembly. Unfortunately, their information dependency has been overlooked, and each has been formulated as an independent problem. In this work, we develop a sophisticated workflow called integrated Metagenomic Protein Predictor (iMPP), which leverages the information dependencies for better de novo functional analysis. iMPP contains three novel modules: a hybrid assembly graph generation module, a graph-based gene calling module, and a peptide assembly-based refinement module. iMPP significantly improved the existing gene calling sensitivity on unassembled metagenomic reads, achieving a 92–97% recall rate at a high precision level (>85%). iMPP further allowed for more sensitive and accurate peptide assembly, recovering more reference proteins and delivering more hypothetical protein sequences. The high performance of iMPP can provide a more comprehensive and unbiased view of the microbial communities under investigation. iMPP is freely available from https://github.com/Sirisha-t/iMPP.
format Online
Article
Text
id pubmed-10006731
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-100067312023-03-12 Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data Thippabhotla, Sirisha Liu, Ben Podgorny, Adam Yooseph, Shibu Yang, Youngik Zhang, Jun Zhong, Cuncong NAR Genom Bioinform Methods Article Metagenomics is the study of all genomic content contained in given microbial communities. Metagenomic functional analysis aims to quantify protein families and reconstruct metabolic pathways from the metagenome. It plays a central role in understanding the interaction between the microbial community and its host or environment. De novo functional analysis, which allows the discovery of novel protein families, remains challenging for high-complexity communities. There are currently three main approaches for recovering novel genes or proteins: de novo nucleotide assembly, gene calling and peptide assembly. Unfortunately, their information dependency has been overlooked, and each has been formulated as an independent problem. In this work, we develop a sophisticated workflow called integrated Metagenomic Protein Predictor (iMPP), which leverages the information dependencies for better de novo functional analysis. iMPP contains three novel modules: a hybrid assembly graph generation module, a graph-based gene calling module, and a peptide assembly-based refinement module. iMPP significantly improved the existing gene calling sensitivity on unassembled metagenomic reads, achieving a 92–97% recall rate at a high precision level (>85%). iMPP further allowed for more sensitive and accurate peptide assembly, recovering more reference proteins and delivering more hypothetical protein sequences. The high performance of iMPP can provide a more comprehensive and unbiased view of the microbial communities under investigation. iMPP is freely available from https://github.com/Sirisha-t/iMPP. Oxford University Press 2023-03-11 /pmc/articles/PMC10006731/ /pubmed/36915411 http://dx.doi.org/10.1093/nargab/lqad023 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Article
Thippabhotla, Sirisha
Liu, Ben
Podgorny, Adam
Yooseph, Shibu
Yang, Youngik
Zhang, Jun
Zhong, Cuncong
Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data
title Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data
title_full Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data
title_fullStr Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data
title_full_unstemmed Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data
title_short Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data
title_sort integrated de novo gene prediction and peptide assembly of metagenomic sequencing data
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10006731/
https://www.ncbi.nlm.nih.gov/pubmed/36915411
http://dx.doi.org/10.1093/nargab/lqad023
work_keys_str_mv AT thippabhotlasirisha integrateddenovogenepredictionandpeptideassemblyofmetagenomicsequencingdata
AT liuben integrateddenovogenepredictionandpeptideassemblyofmetagenomicsequencingdata
AT podgornyadam integrateddenovogenepredictionandpeptideassemblyofmetagenomicsequencingdata
AT yoosephshibu integrateddenovogenepredictionandpeptideassemblyofmetagenomicsequencingdata
AT yangyoungik integrateddenovogenepredictionandpeptideassemblyofmetagenomicsequencingdata
AT zhangjun integrateddenovogenepredictionandpeptideassemblyofmetagenomicsequencingdata
AT zhongcuncong integrateddenovogenepredictionandpeptideassemblyofmetagenomicsequencingdata