Cargando…
PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data
The reliable detection of novel bacterial pathogens from next-generation sequencing data is a key challenge for microbial diagnostics. Current computational tools usually rely on sequence similarity and often fail to detect novel species when closely related genomes are unavailable or missing from t...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5209729/ https://www.ncbi.nlm.nih.gov/pubmed/28051068 http://dx.doi.org/10.1038/srep39194 |
_version_ | 1782490784773177344 |
---|---|
author | Deneke, Carlus Rentzsch, Robert Renard, Bernhard Y. |
author_facet | Deneke, Carlus Rentzsch, Robert Renard, Bernhard Y. |
author_sort | Deneke, Carlus |
collection | PubMed |
description | The reliable detection of novel bacterial pathogens from next-generation sequencing data is a key challenge for microbial diagnostics. Current computational tools usually rely on sequence similarity and often fail to detect novel species when closely related genomes are unavailable or missing from the reference database. Here we present the machine learning based approach PaPrBaG (Pathogenicity Prediction for Bacterial Genomes). PaPrBaG overcomes genetic divergence by training on a wide range of species with known pathogenicity phenotype. To that end we compiled a comprehensive list of pathogenic and non-pathogenic bacteria with human host, using various genome metadata in conjunction with a rule-based protocol. A detailed comparative study reveals that PaPrBaG has several advantages over sequence similarity approaches. Most importantly, it always provides a prediction whereas other approaches discard a large number of sequencing reads with low similarity to currently known reference genomes. Furthermore, PaPrBaG remains reliable even at very low genomic coverages. CombiningPaPrBaG with existing approaches further improves prediction results. |
format | Online Article Text |
id | pubmed-5209729 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-52097292017-01-05 PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data Deneke, Carlus Rentzsch, Robert Renard, Bernhard Y. Sci Rep Article The reliable detection of novel bacterial pathogens from next-generation sequencing data is a key challenge for microbial diagnostics. Current computational tools usually rely on sequence similarity and often fail to detect novel species when closely related genomes are unavailable or missing from the reference database. Here we present the machine learning based approach PaPrBaG (Pathogenicity Prediction for Bacterial Genomes). PaPrBaG overcomes genetic divergence by training on a wide range of species with known pathogenicity phenotype. To that end we compiled a comprehensive list of pathogenic and non-pathogenic bacteria with human host, using various genome metadata in conjunction with a rule-based protocol. A detailed comparative study reveals that PaPrBaG has several advantages over sequence similarity approaches. Most importantly, it always provides a prediction whereas other approaches discard a large number of sequencing reads with low similarity to currently known reference genomes. Furthermore, PaPrBaG remains reliable even at very low genomic coverages. CombiningPaPrBaG with existing approaches further improves prediction results. Nature Publishing Group 2017-01-04 /pmc/articles/PMC5209729/ /pubmed/28051068 http://dx.doi.org/10.1038/srep39194 Text en Copyright © 2017, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Deneke, Carlus Rentzsch, Robert Renard, Bernhard Y. PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data |
title | PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data |
title_full | PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data |
title_fullStr | PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data |
title_full_unstemmed | PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data |
title_short | PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data |
title_sort | paprbag: a machine learning approach for the detection of novel pathogens from ngs data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5209729/ https://www.ncbi.nlm.nih.gov/pubmed/28051068 http://dx.doi.org/10.1038/srep39194 |
work_keys_str_mv | AT denekecarlus paprbagamachinelearningapproachforthedetectionofnovelpathogensfromngsdata AT rentzschrobert paprbagamachinelearningapproachforthedetectionofnovelpathogensfromngsdata AT renardbernhardy paprbagamachinelearningapproachforthedetectionofnovelpathogensfromngsdata |