Cargando…
Fine-scale differentiation between Bacillus anthracis and Bacillus cereus group signatures in metagenome shotgun data
BACKGROUND: It is possible to detect bacterial species in shotgun metagenome datasets through the presence of only a few sequence reads. However, false positive results can arise, as was the case in the initial findings of a recent New York City subway metagenome project. False positives are especia...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6109372/ https://www.ncbi.nlm.nih.gov/pubmed/30155371 http://dx.doi.org/10.7717/peerj.5515 |
_version_ | 1783350318067613696 |
---|---|
author | Petit III, Robert A. Hogan, James M. Ezewudo, Matthew N. Joseph, Sandeep J. Read, Timothy D. |
author_facet | Petit III, Robert A. Hogan, James M. Ezewudo, Matthew N. Joseph, Sandeep J. Read, Timothy D. |
author_sort | Petit III, Robert A. |
collection | PubMed |
description | BACKGROUND: It is possible to detect bacterial species in shotgun metagenome datasets through the presence of only a few sequence reads. However, false positive results can arise, as was the case in the initial findings of a recent New York City subway metagenome project. False positives are especially likely when two closely related are present in the same sample. Bacillus anthracis, the etiologic agent of anthrax, is a high-consequence pathogen that shares >99% average nucleotide identity with Bacillus cereus group (BCerG) genomes. Our goal was to create an analysis tool that used k-mers to detect B. anthracis, incorporating information about the coverage of BCerG in the metagenome sample. METHODS: Using public complete genome sequence datasets, we identified a set of 31-mer signatures that differentiated B. anthracis from other members of the B. cereus group (BCerG), and another set which differentiated BCerG genomes (including B. anthracis) from other Bacillus strains. We also created a set of 31-mers for detecting the lethal factor gene, the key genetic diagnostic of the presence of anthrax-causing bacteria. We created synthetic sequence datasets based on existing genomes to test the accuracy of a k-mer based detection model. RESULTS: We found 239,503 B. anthracis-specific 31-mers (the Ba31 set), 10,183 BCerG 31-mers (the BCerG31 set), and 2,617 lethal factor k-mers (the lef31 set). We showed that false positive B. anthracis k-mers—which arise from random sequencing errors—are observable at high genome coverages of B. cereus. We also showed that there is a “gray zone” below 0.184× coverage of the B. anthracis genome sequence, in which we cannot expect with high probability to identify lethal factor k-mers. We created a linear regression model to differentiate the presence of B. anthracis-like chromosomes from sequencing errors given the BCerG background coverage. We showed that while shotgun datasets from the New York City subway metagenome project had no matches to lef31 k-mers and hence were negative for B. anthracis, some samples showed evidence of strains very closely related to the pathogen. DISCUSSION: This work shows how extensive libraries of complete genomes can be used to create organism-specific signatures to help interpret metagenomes. We contrast “specialist” approaches to metagenome analysis such as this work to “generalist” software that seeks to classify all organisms present in the sample and note the more general utility of a k-mer filter approach when taxonomic boundaries lack clarity or high levels of precision are required. |
format | Online Article Text |
id | pubmed-6109372 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-61093722018-08-28 Fine-scale differentiation between Bacillus anthracis and Bacillus cereus group signatures in metagenome shotgun data Petit III, Robert A. Hogan, James M. Ezewudo, Matthew N. Joseph, Sandeep J. Read, Timothy D. PeerJ Bioinformatics BACKGROUND: It is possible to detect bacterial species in shotgun metagenome datasets through the presence of only a few sequence reads. However, false positive results can arise, as was the case in the initial findings of a recent New York City subway metagenome project. False positives are especially likely when two closely related are present in the same sample. Bacillus anthracis, the etiologic agent of anthrax, is a high-consequence pathogen that shares >99% average nucleotide identity with Bacillus cereus group (BCerG) genomes. Our goal was to create an analysis tool that used k-mers to detect B. anthracis, incorporating information about the coverage of BCerG in the metagenome sample. METHODS: Using public complete genome sequence datasets, we identified a set of 31-mer signatures that differentiated B. anthracis from other members of the B. cereus group (BCerG), and another set which differentiated BCerG genomes (including B. anthracis) from other Bacillus strains. We also created a set of 31-mers for detecting the lethal factor gene, the key genetic diagnostic of the presence of anthrax-causing bacteria. We created synthetic sequence datasets based on existing genomes to test the accuracy of a k-mer based detection model. RESULTS: We found 239,503 B. anthracis-specific 31-mers (the Ba31 set), 10,183 BCerG 31-mers (the BCerG31 set), and 2,617 lethal factor k-mers (the lef31 set). We showed that false positive B. anthracis k-mers—which arise from random sequencing errors—are observable at high genome coverages of B. cereus. We also showed that there is a “gray zone” below 0.184× coverage of the B. anthracis genome sequence, in which we cannot expect with high probability to identify lethal factor k-mers. We created a linear regression model to differentiate the presence of B. anthracis-like chromosomes from sequencing errors given the BCerG background coverage. We showed that while shotgun datasets from the New York City subway metagenome project had no matches to lef31 k-mers and hence were negative for B. anthracis, some samples showed evidence of strains very closely related to the pathogen. DISCUSSION: This work shows how extensive libraries of complete genomes can be used to create organism-specific signatures to help interpret metagenomes. We contrast “specialist” approaches to metagenome analysis such as this work to “generalist” software that seeks to classify all organisms present in the sample and note the more general utility of a k-mer filter approach when taxonomic boundaries lack clarity or high levels of precision are required. PeerJ Inc. 2018-08-22 /pmc/articles/PMC6109372/ /pubmed/30155371 http://dx.doi.org/10.7717/peerj.5515 Text en ©2018 Petit III et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Petit III, Robert A. Hogan, James M. Ezewudo, Matthew N. Joseph, Sandeep J. Read, Timothy D. Fine-scale differentiation between Bacillus anthracis and Bacillus cereus group signatures in metagenome shotgun data |
title | Fine-scale differentiation between Bacillus anthracis and Bacillus cereus group signatures in metagenome shotgun data |
title_full | Fine-scale differentiation between Bacillus anthracis and Bacillus cereus group signatures in metagenome shotgun data |
title_fullStr | Fine-scale differentiation between Bacillus anthracis and Bacillus cereus group signatures in metagenome shotgun data |
title_full_unstemmed | Fine-scale differentiation between Bacillus anthracis and Bacillus cereus group signatures in metagenome shotgun data |
title_short | Fine-scale differentiation between Bacillus anthracis and Bacillus cereus group signatures in metagenome shotgun data |
title_sort | fine-scale differentiation between bacillus anthracis and bacillus cereus group signatures in metagenome shotgun data |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6109372/ https://www.ncbi.nlm.nih.gov/pubmed/30155371 http://dx.doi.org/10.7717/peerj.5515 |
work_keys_str_mv | AT petitiiiroberta finescaledifferentiationbetweenbacillusanthracisandbacilluscereusgroupsignaturesinmetagenomeshotgundata AT hoganjamesm finescaledifferentiationbetweenbacillusanthracisandbacilluscereusgroupsignaturesinmetagenomeshotgundata AT ezewudomatthewn finescaledifferentiationbetweenbacillusanthracisandbacilluscereusgroupsignaturesinmetagenomeshotgundata AT josephsandeepj finescaledifferentiationbetweenbacillusanthracisandbacilluscereusgroupsignaturesinmetagenomeshotgundata AT readtimothyd finescaledifferentiationbetweenbacillusanthracisandbacilluscereusgroupsignaturesinmetagenomeshotgundata |