Cargando…
Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data
Metagenomics is the study of all genomic content contained in given microbial communities. Metagenomic functional analysis aims to quantify protein families and reconstruct metabolic pathways from the metagenome. It plays a central role in understanding the interaction between the microbial communit...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10006731/ https://www.ncbi.nlm.nih.gov/pubmed/36915411 http://dx.doi.org/10.1093/nargab/lqad023 |
_version_ | 1784905364316094464 |
---|---|
author | Thippabhotla, Sirisha Liu, Ben Podgorny, Adam Yooseph, Shibu Yang, Youngik Zhang, Jun Zhong, Cuncong |
author_facet | Thippabhotla, Sirisha Liu, Ben Podgorny, Adam Yooseph, Shibu Yang, Youngik Zhang, Jun Zhong, Cuncong |
author_sort | Thippabhotla, Sirisha |
collection | PubMed |
description | Metagenomics is the study of all genomic content contained in given microbial communities. Metagenomic functional analysis aims to quantify protein families and reconstruct metabolic pathways from the metagenome. It plays a central role in understanding the interaction between the microbial community and its host or environment. De novo functional analysis, which allows the discovery of novel protein families, remains challenging for high-complexity communities. There are currently three main approaches for recovering novel genes or proteins: de novo nucleotide assembly, gene calling and peptide assembly. Unfortunately, their information dependency has been overlooked, and each has been formulated as an independent problem. In this work, we develop a sophisticated workflow called integrated Metagenomic Protein Predictor (iMPP), which leverages the information dependencies for better de novo functional analysis. iMPP contains three novel modules: a hybrid assembly graph generation module, a graph-based gene calling module, and a peptide assembly-based refinement module. iMPP significantly improved the existing gene calling sensitivity on unassembled metagenomic reads, achieving a 92–97% recall rate at a high precision level (>85%). iMPP further allowed for more sensitive and accurate peptide assembly, recovering more reference proteins and delivering more hypothetical protein sequences. The high performance of iMPP can provide a more comprehensive and unbiased view of the microbial communities under investigation. iMPP is freely available from https://github.com/Sirisha-t/iMPP. |
format | Online Article Text |
id | pubmed-10006731 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-100067312023-03-12 Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data Thippabhotla, Sirisha Liu, Ben Podgorny, Adam Yooseph, Shibu Yang, Youngik Zhang, Jun Zhong, Cuncong NAR Genom Bioinform Methods Article Metagenomics is the study of all genomic content contained in given microbial communities. Metagenomic functional analysis aims to quantify protein families and reconstruct metabolic pathways from the metagenome. It plays a central role in understanding the interaction between the microbial community and its host or environment. De novo functional analysis, which allows the discovery of novel protein families, remains challenging for high-complexity communities. There are currently three main approaches for recovering novel genes or proteins: de novo nucleotide assembly, gene calling and peptide assembly. Unfortunately, their information dependency has been overlooked, and each has been formulated as an independent problem. In this work, we develop a sophisticated workflow called integrated Metagenomic Protein Predictor (iMPP), which leverages the information dependencies for better de novo functional analysis. iMPP contains three novel modules: a hybrid assembly graph generation module, a graph-based gene calling module, and a peptide assembly-based refinement module. iMPP significantly improved the existing gene calling sensitivity on unassembled metagenomic reads, achieving a 92–97% recall rate at a high precision level (>85%). iMPP further allowed for more sensitive and accurate peptide assembly, recovering more reference proteins and delivering more hypothetical protein sequences. The high performance of iMPP can provide a more comprehensive and unbiased view of the microbial communities under investigation. iMPP is freely available from https://github.com/Sirisha-t/iMPP. Oxford University Press 2023-03-11 /pmc/articles/PMC10006731/ /pubmed/36915411 http://dx.doi.org/10.1093/nargab/lqad023 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Article Thippabhotla, Sirisha Liu, Ben Podgorny, Adam Yooseph, Shibu Yang, Youngik Zhang, Jun Zhong, Cuncong Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data |
title | Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data |
title_full | Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data |
title_fullStr | Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data |
title_full_unstemmed | Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data |
title_short | Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data |
title_sort | integrated de novo gene prediction and peptide assembly of metagenomic sequencing data |
topic | Methods Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10006731/ https://www.ncbi.nlm.nih.gov/pubmed/36915411 http://dx.doi.org/10.1093/nargab/lqad023 |
work_keys_str_mv | AT thippabhotlasirisha integrateddenovogenepredictionandpeptideassemblyofmetagenomicsequencingdata AT liuben integrateddenovogenepredictionandpeptideassemblyofmetagenomicsequencingdata AT podgornyadam integrateddenovogenepredictionandpeptideassemblyofmetagenomicsequencingdata AT yoosephshibu integrateddenovogenepredictionandpeptideassemblyofmetagenomicsequencingdata AT yangyoungik integrateddenovogenepredictionandpeptideassemblyofmetagenomicsequencingdata AT zhangjun integrateddenovogenepredictionandpeptideassemblyofmetagenomicsequencingdata AT zhongcuncong integrateddenovogenepredictionandpeptideassemblyofmetagenomicsequencingdata |