Cargando…

PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data

The reliable detection of novel bacterial pathogens from next-generation sequencing data is a key challenge for microbial diagnostics. Current computational tools usually rely on sequence similarity and often fail to detect novel species when closely related genomes are unavailable or missing from t...

Descripción completa

Detalles Bibliográficos
Autores principales: Deneke, Carlus, Rentzsch, Robert, Renard, Bernhard Y.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5209729/
https://www.ncbi.nlm.nih.gov/pubmed/28051068
http://dx.doi.org/10.1038/srep39194
_version_ 1782490784773177344
author Deneke, Carlus
Rentzsch, Robert
Renard, Bernhard Y.
author_facet Deneke, Carlus
Rentzsch, Robert
Renard, Bernhard Y.
author_sort Deneke, Carlus
collection PubMed
description The reliable detection of novel bacterial pathogens from next-generation sequencing data is a key challenge for microbial diagnostics. Current computational tools usually rely on sequence similarity and often fail to detect novel species when closely related genomes are unavailable or missing from the reference database. Here we present the machine learning based approach PaPrBaG (Pathogenicity Prediction for Bacterial Genomes). PaPrBaG overcomes genetic divergence by training on a wide range of species with known pathogenicity phenotype. To that end we compiled a comprehensive list of pathogenic and non-pathogenic bacteria with human host, using various genome metadata in conjunction with a rule-based protocol. A detailed comparative study reveals that PaPrBaG has several advantages over sequence similarity approaches. Most importantly, it always provides a prediction whereas other approaches discard a large number of sequencing reads with low similarity to currently known reference genomes. Furthermore, PaPrBaG remains reliable even at very low genomic coverages. CombiningPaPrBaG with existing approaches further improves prediction results.
format Online
Article
Text
id pubmed-5209729
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-52097292017-01-05 PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data Deneke, Carlus Rentzsch, Robert Renard, Bernhard Y. Sci Rep Article The reliable detection of novel bacterial pathogens from next-generation sequencing data is a key challenge for microbial diagnostics. Current computational tools usually rely on sequence similarity and often fail to detect novel species when closely related genomes are unavailable or missing from the reference database. Here we present the machine learning based approach PaPrBaG (Pathogenicity Prediction for Bacterial Genomes). PaPrBaG overcomes genetic divergence by training on a wide range of species with known pathogenicity phenotype. To that end we compiled a comprehensive list of pathogenic and non-pathogenic bacteria with human host, using various genome metadata in conjunction with a rule-based protocol. A detailed comparative study reveals that PaPrBaG has several advantages over sequence similarity approaches. Most importantly, it always provides a prediction whereas other approaches discard a large number of sequencing reads with low similarity to currently known reference genomes. Furthermore, PaPrBaG remains reliable even at very low genomic coverages. CombiningPaPrBaG with existing approaches further improves prediction results. Nature Publishing Group 2017-01-04 /pmc/articles/PMC5209729/ /pubmed/28051068 http://dx.doi.org/10.1038/srep39194 Text en Copyright © 2017, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Deneke, Carlus
Rentzsch, Robert
Renard, Bernhard Y.
PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data
title PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data
title_full PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data
title_fullStr PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data
title_full_unstemmed PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data
title_short PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data
title_sort paprbag: a machine learning approach for the detection of novel pathogens from ngs data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5209729/
https://www.ncbi.nlm.nih.gov/pubmed/28051068
http://dx.doi.org/10.1038/srep39194
work_keys_str_mv AT denekecarlus paprbagamachinelearningapproachforthedetectionofnovelpathogensfromngsdata
AT rentzschrobert paprbagamachinelearningapproachforthedetectionofnovelpathogensfromngsdata
AT renardbernhardy paprbagamachinelearningapproachforthedetectionofnovelpathogensfromngsdata