Cargando…

PepGM: a probabilistic graphical model for taxonomic inference of viral proteome samples with associated confidence scores

MOTIVATION: Inferring taxonomy in mass spectrometry-based shotgun proteomics is a complex task. In multi-species or viral samples of unknown taxonomic origin, the presence of proteins and corresponding taxa must be inferred from a list of identified peptides, which is often complicated by protein ho...

Descripción completa

Detalles Bibliográficos
Autores principales: Holstein, Tanja, Kistner, Franziska, Martens, Lennart, Muth, Thilo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10182852/
https://www.ncbi.nlm.nih.gov/pubmed/37129543
http://dx.doi.org/10.1093/bioinformatics/btad289
_version_ 1785041839569502208
author Holstein, Tanja
Kistner, Franziska
Martens, Lennart
Muth, Thilo
author_facet Holstein, Tanja
Kistner, Franziska
Martens, Lennart
Muth, Thilo
author_sort Holstein, Tanja
collection PubMed
description MOTIVATION: Inferring taxonomy in mass spectrometry-based shotgun proteomics is a complex task. In multi-species or viral samples of unknown taxonomic origin, the presence of proteins and corresponding taxa must be inferred from a list of identified peptides, which is often complicated by protein homology: many proteins do not only share peptides within a taxon but also between taxa. However, the correct taxonomic inference is crucial when identifying different viral strains with high-sequence homology—considering, e.g., the different epidemiological characteristics of the various strains of severe acute respiratory syndrome-related coronavirus-2. Additionally, many viruses mutate frequently, further complicating the correct identification of viral proteomic samples. RESULTS: We present PepGM, a probabilistic graphical model for the taxonomic assignment of virus proteomic samples with strain-level resolution and associated confidence scores. PepGM combines the results of a standard proteomic database search algorithm with belief propagation to calculate the marginal distributions, and thus confidence scores, for potential taxonomic assignments. We demonstrate the performance of PepGM using several publicly available virus proteomic datasets, showing its strain-level resolution performance. In two out of eight cases, the taxonomic assignments were only correct on the species level, which PepGM clearly indicates by lower confidence scores. AVAILABILITY AND IMPLEMENTATION: PepGM is written in Python and embedded into a Snakemake workflow. It is available at https://github.com/BAMeScience/PepGM.
format Online
Article
Text
id pubmed-10182852
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101828522023-05-14 PepGM: a probabilistic graphical model for taxonomic inference of viral proteome samples with associated confidence scores Holstein, Tanja Kistner, Franziska Martens, Lennart Muth, Thilo Bioinformatics Original Paper MOTIVATION: Inferring taxonomy in mass spectrometry-based shotgun proteomics is a complex task. In multi-species or viral samples of unknown taxonomic origin, the presence of proteins and corresponding taxa must be inferred from a list of identified peptides, which is often complicated by protein homology: many proteins do not only share peptides within a taxon but also between taxa. However, the correct taxonomic inference is crucial when identifying different viral strains with high-sequence homology—considering, e.g., the different epidemiological characteristics of the various strains of severe acute respiratory syndrome-related coronavirus-2. Additionally, many viruses mutate frequently, further complicating the correct identification of viral proteomic samples. RESULTS: We present PepGM, a probabilistic graphical model for the taxonomic assignment of virus proteomic samples with strain-level resolution and associated confidence scores. PepGM combines the results of a standard proteomic database search algorithm with belief propagation to calculate the marginal distributions, and thus confidence scores, for potential taxonomic assignments. We demonstrate the performance of PepGM using several publicly available virus proteomic datasets, showing its strain-level resolution performance. In two out of eight cases, the taxonomic assignments were only correct on the species level, which PepGM clearly indicates by lower confidence scores. AVAILABILITY AND IMPLEMENTATION: PepGM is written in Python and embedded into a Snakemake workflow. It is available at https://github.com/BAMeScience/PepGM. Oxford University Press 2023-05-02 /pmc/articles/PMC10182852/ /pubmed/37129543 http://dx.doi.org/10.1093/bioinformatics/btad289 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Holstein, Tanja
Kistner, Franziska
Martens, Lennart
Muth, Thilo
PepGM: a probabilistic graphical model for taxonomic inference of viral proteome samples with associated confidence scores
title PepGM: a probabilistic graphical model for taxonomic inference of viral proteome samples with associated confidence scores
title_full PepGM: a probabilistic graphical model for taxonomic inference of viral proteome samples with associated confidence scores
title_fullStr PepGM: a probabilistic graphical model for taxonomic inference of viral proteome samples with associated confidence scores
title_full_unstemmed PepGM: a probabilistic graphical model for taxonomic inference of viral proteome samples with associated confidence scores
title_short PepGM: a probabilistic graphical model for taxonomic inference of viral proteome samples with associated confidence scores
title_sort pepgm: a probabilistic graphical model for taxonomic inference of viral proteome samples with associated confidence scores
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10182852/
https://www.ncbi.nlm.nih.gov/pubmed/37129543
http://dx.doi.org/10.1093/bioinformatics/btad289
work_keys_str_mv AT holsteintanja pepgmaprobabilisticgraphicalmodelfortaxonomicinferenceofviralproteomesampleswithassociatedconfidencescores
AT kistnerfranziska pepgmaprobabilisticgraphicalmodelfortaxonomicinferenceofviralproteomesampleswithassociatedconfidencescores
AT martenslennart pepgmaprobabilisticgraphicalmodelfortaxonomicinferenceofviralproteomesampleswithassociatedconfidencescores
AT muththilo pepgmaprobabilisticgraphicalmodelfortaxonomicinferenceofviralproteomesampleswithassociatedconfidencescores