Cargando…

MetaNovo: An open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets

BACKGROUND: Microbiome research is providing important new insights into the metabolic interactions of complex microbial ecosystems involved in fields as diverse as the pathogenesis of human diseases, agriculture and climate change. Poor correlations typically observed between RNA and protein expres...

Descripción completa

Detalles Bibliográficos
Autores principales: Potgieter, Matthys G., Nel, Andrew J. M., Fortuin, Suereta, Garnett, Shaun, Wendoh, Jerome M., Tabb, David L., Mulder, Nicola J., Blackburn, Jonathan M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10310047/
https://www.ncbi.nlm.nih.gov/pubmed/37327214
http://dx.doi.org/10.1371/journal.pcbi.1011163
_version_ 1785066504390180864
author Potgieter, Matthys G.
Nel, Andrew J. M.
Fortuin, Suereta
Garnett, Shaun
Wendoh, Jerome M.
Tabb, David L.
Mulder, Nicola J.
Blackburn, Jonathan M.
author_facet Potgieter, Matthys G.
Nel, Andrew J. M.
Fortuin, Suereta
Garnett, Shaun
Wendoh, Jerome M.
Tabb, David L.
Mulder, Nicola J.
Blackburn, Jonathan M.
author_sort Potgieter, Matthys G.
collection PubMed
description BACKGROUND: Microbiome research is providing important new insights into the metabolic interactions of complex microbial ecosystems involved in fields as diverse as the pathogenesis of human diseases, agriculture and climate change. Poor correlations typically observed between RNA and protein expression datasets make it hard to accurately infer microbial protein synthesis from metagenomic data. Additionally, mass spectrometry-based metaproteomic analyses typically rely on focused search sequence databases based on prior knowledge for protein identification that may not represent all the proteins present in a set of samples. Metagenomic 16S rRNA sequencing only targets the bacterial component, while whole genome sequencing is at best an indirect measure of expressed proteomes. Here we describe a novel approach, MetaNovo, that combines existing open-source software tools to perform scalable de novo sequence tag matching with a novel algorithm for probabilistic optimization of the entire UniProt knowledgebase to create tailored sequence databases for target-decoy searches directly at the proteome level, enabling metaproteomic analyses without prior expectation of sample composition or metagenomic data generation and compatible with standard downstream analysis pipelines. RESULTS: We compared MetaNovo to published results from the MetaPro-IQ pipeline on 8 human mucosal-luminal interface samples, with comparable numbers of peptide and protein identifications, many shared peptide sequences and a similar bacterial taxonomic distribution compared to that found using a matched metagenome sequence database—but simultaneously identified many more non-bacterial peptides than the previous approaches. MetaNovo was also benchmarked on samples of known microbial composition against matched metagenomic and whole genomic sequence database workflows, yielding many more MS/MS identifications for the expected taxa, with improved taxonomic representation, while also highlighting previously described genome sequencing quality concerns for one of the organisms, and identifying an experimental sample contaminant without prior expectation. CONCLUSIONS: By estimating taxonomic and peptide level information directly on microbiome samples from tandem mass spectrometry data, MetaNovo enables the simultaneous identification of peptides from all domains of life in metaproteome samples, bypassing the need for curated sequence databases to search. We show that the MetaNovo approach to mass spectrometry metaproteomics is more accurate than current gold standard approaches of tailored or matched genomic sequence database searches, can identify sample contaminants without prior expectation and yields insights into previously unidentified metaproteomic signals, building on the potential for complex mass spectrometry metaproteomic data to speak for itself.
format Online
Article
Text
id pubmed-10310047
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-103100472023-06-30 MetaNovo: An open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets Potgieter, Matthys G. Nel, Andrew J. M. Fortuin, Suereta Garnett, Shaun Wendoh, Jerome M. Tabb, David L. Mulder, Nicola J. Blackburn, Jonathan M. PLoS Comput Biol Research Article BACKGROUND: Microbiome research is providing important new insights into the metabolic interactions of complex microbial ecosystems involved in fields as diverse as the pathogenesis of human diseases, agriculture and climate change. Poor correlations typically observed between RNA and protein expression datasets make it hard to accurately infer microbial protein synthesis from metagenomic data. Additionally, mass spectrometry-based metaproteomic analyses typically rely on focused search sequence databases based on prior knowledge for protein identification that may not represent all the proteins present in a set of samples. Metagenomic 16S rRNA sequencing only targets the bacterial component, while whole genome sequencing is at best an indirect measure of expressed proteomes. Here we describe a novel approach, MetaNovo, that combines existing open-source software tools to perform scalable de novo sequence tag matching with a novel algorithm for probabilistic optimization of the entire UniProt knowledgebase to create tailored sequence databases for target-decoy searches directly at the proteome level, enabling metaproteomic analyses without prior expectation of sample composition or metagenomic data generation and compatible with standard downstream analysis pipelines. RESULTS: We compared MetaNovo to published results from the MetaPro-IQ pipeline on 8 human mucosal-luminal interface samples, with comparable numbers of peptide and protein identifications, many shared peptide sequences and a similar bacterial taxonomic distribution compared to that found using a matched metagenome sequence database—but simultaneously identified many more non-bacterial peptides than the previous approaches. MetaNovo was also benchmarked on samples of known microbial composition against matched metagenomic and whole genomic sequence database workflows, yielding many more MS/MS identifications for the expected taxa, with improved taxonomic representation, while also highlighting previously described genome sequencing quality concerns for one of the organisms, and identifying an experimental sample contaminant without prior expectation. CONCLUSIONS: By estimating taxonomic and peptide level information directly on microbiome samples from tandem mass spectrometry data, MetaNovo enables the simultaneous identification of peptides from all domains of life in metaproteome samples, bypassing the need for curated sequence databases to search. We show that the MetaNovo approach to mass spectrometry metaproteomics is more accurate than current gold standard approaches of tailored or matched genomic sequence database searches, can identify sample contaminants without prior expectation and yields insights into previously unidentified metaproteomic signals, building on the potential for complex mass spectrometry metaproteomic data to speak for itself. Public Library of Science 2023-06-16 /pmc/articles/PMC10310047/ /pubmed/37327214 http://dx.doi.org/10.1371/journal.pcbi.1011163 Text en © 2023 Potgieter et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Potgieter, Matthys G.
Nel, Andrew J. M.
Fortuin, Suereta
Garnett, Shaun
Wendoh, Jerome M.
Tabb, David L.
Mulder, Nicola J.
Blackburn, Jonathan M.
MetaNovo: An open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets
title MetaNovo: An open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets
title_full MetaNovo: An open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets
title_fullStr MetaNovo: An open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets
title_full_unstemmed MetaNovo: An open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets
title_short MetaNovo: An open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets
title_sort metanovo: an open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10310047/
https://www.ncbi.nlm.nih.gov/pubmed/37327214
http://dx.doi.org/10.1371/journal.pcbi.1011163
work_keys_str_mv AT potgietermatthysg metanovoanopensourcepipelineforprobabilisticpeptidediscoveryincomplexmetaproteomicdatasets
AT nelandrewjm metanovoanopensourcepipelineforprobabilisticpeptidediscoveryincomplexmetaproteomicdatasets
AT fortuinsuereta metanovoanopensourcepipelineforprobabilisticpeptidediscoveryincomplexmetaproteomicdatasets
AT garnettshaun metanovoanopensourcepipelineforprobabilisticpeptidediscoveryincomplexmetaproteomicdatasets
AT wendohjeromem metanovoanopensourcepipelineforprobabilisticpeptidediscoveryincomplexmetaproteomicdatasets
AT tabbdavidl metanovoanopensourcepipelineforprobabilisticpeptidediscoveryincomplexmetaproteomicdatasets
AT muldernicolaj metanovoanopensourcepipelineforprobabilisticpeptidediscoveryincomplexmetaproteomicdatasets
AT blackburnjonathanm metanovoanopensourcepipelineforprobabilisticpeptidediscoveryincomplexmetaproteomicdatasets