Cargando…

Predicting the pathogenicity of bacterial genomes using widely spread protein families

BACKGROUND: The human body is inhabited by a diverse community of commensal non-pathogenic bacteria, many of which are essential for our health. By contrast, pathogenic bacteria have the ability to invade their hosts and cause a disease. Characterizing the differences between pathogenic and commensa...

Descripción completa

Detalles Bibliográficos
Autores principales: Naor-Hoffmann, Shaked, Svetlitsky, Dina, Sal-Man, Neta, Orenstein, Yaron, Ziv-Ukelson, Michal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9233384/
https://www.ncbi.nlm.nih.gov/pubmed/35751023
http://dx.doi.org/10.1186/s12859-022-04777-w
_version_ 1784735753651093504
author Naor-Hoffmann, Shaked
Svetlitsky, Dina
Sal-Man, Neta
Orenstein, Yaron
Ziv-Ukelson, Michal
author_facet Naor-Hoffmann, Shaked
Svetlitsky, Dina
Sal-Man, Neta
Orenstein, Yaron
Ziv-Ukelson, Michal
author_sort Naor-Hoffmann, Shaked
collection PubMed
description BACKGROUND: The human body is inhabited by a diverse community of commensal non-pathogenic bacteria, many of which are essential for our health. By contrast, pathogenic bacteria have the ability to invade their hosts and cause a disease. Characterizing the differences between pathogenic and commensal non-pathogenic bacteria is important for the detection of emerging pathogens and for the development of new treatments. Previous methods for classification of bacteria as pathogenic or non-pathogenic used either raw genomic reads or protein families as features. Using protein families instead of reads provided a better interpretability of the resulting model. However, the accuracy of protein-families-based classifiers can still be improved. RESULTS: We developed a wide scope pathogenicity classifier (WSPC), a new protein-content-based machine-learning classification model. We trained WSPC on a newly curated dataset of 641 bacterial genomes, where each genome belongs to a different species. A comparative analysis we conducted shows that WSPC outperforms existing models on two benchmark test sets. We observed that the most discriminative protein-family features in WSPC are widely spread among bacterial species. These features correspond to proteins that are involved in the ability of bacteria to survive and replicate during an infection, rather than proteins that are directly involved in damaging or invading the host.
format Online
Article
Text
id pubmed-9233384
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-92333842022-06-26 Predicting the pathogenicity of bacterial genomes using widely spread protein families Naor-Hoffmann, Shaked Svetlitsky, Dina Sal-Man, Neta Orenstein, Yaron Ziv-Ukelson, Michal BMC Bioinformatics Research BACKGROUND: The human body is inhabited by a diverse community of commensal non-pathogenic bacteria, many of which are essential for our health. By contrast, pathogenic bacteria have the ability to invade their hosts and cause a disease. Characterizing the differences between pathogenic and commensal non-pathogenic bacteria is important for the detection of emerging pathogens and for the development of new treatments. Previous methods for classification of bacteria as pathogenic or non-pathogenic used either raw genomic reads or protein families as features. Using protein families instead of reads provided a better interpretability of the resulting model. However, the accuracy of protein-families-based classifiers can still be improved. RESULTS: We developed a wide scope pathogenicity classifier (WSPC), a new protein-content-based machine-learning classification model. We trained WSPC on a newly curated dataset of 641 bacterial genomes, where each genome belongs to a different species. A comparative analysis we conducted shows that WSPC outperforms existing models on two benchmark test sets. We observed that the most discriminative protein-family features in WSPC are widely spread among bacterial species. These features correspond to proteins that are involved in the ability of bacteria to survive and replicate during an infection, rather than proteins that are directly involved in damaging or invading the host. BioMed Central 2022-06-24 /pmc/articles/PMC9233384/ /pubmed/35751023 http://dx.doi.org/10.1186/s12859-022-04777-w Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Naor-Hoffmann, Shaked
Svetlitsky, Dina
Sal-Man, Neta
Orenstein, Yaron
Ziv-Ukelson, Michal
Predicting the pathogenicity of bacterial genomes using widely spread protein families
title Predicting the pathogenicity of bacterial genomes using widely spread protein families
title_full Predicting the pathogenicity of bacterial genomes using widely spread protein families
title_fullStr Predicting the pathogenicity of bacterial genomes using widely spread protein families
title_full_unstemmed Predicting the pathogenicity of bacterial genomes using widely spread protein families
title_short Predicting the pathogenicity of bacterial genomes using widely spread protein families
title_sort predicting the pathogenicity of bacterial genomes using widely spread protein families
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9233384/
https://www.ncbi.nlm.nih.gov/pubmed/35751023
http://dx.doi.org/10.1186/s12859-022-04777-w
work_keys_str_mv AT naorhoffmannshaked predictingthepathogenicityofbacterialgenomesusingwidelyspreadproteinfamilies
AT svetlitskydina predictingthepathogenicityofbacterialgenomesusingwidelyspreadproteinfamilies
AT salmanneta predictingthepathogenicityofbacterialgenomesusingwidelyspreadproteinfamilies
AT orensteinyaron predictingthepathogenicityofbacterialgenomesusingwidelyspreadproteinfamilies
AT zivukelsonmichal predictingthepathogenicityofbacterialgenomesusingwidelyspreadproteinfamilies