Cargando…

The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes

Background. As whole genome sequence data from bacterial isolates becomes cheaper to generate, computational methods are needed to correlate sequence data with biological observations. Here we present the large-scale BLAST score ratio (LS-BSR) pipeline, which rapidly compares the genetic content of...

Descripción completa

Detalles Bibliográficos
Autores principales: Sahl, Jason W., Caporaso, J. Gregory, Rasko, David A., Keim, Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3976120/
https://www.ncbi.nlm.nih.gov/pubmed/24749011
http://dx.doi.org/10.7717/peerj.332
_version_ 1782310243068280832
author Sahl, Jason W.
Caporaso, J. Gregory
Rasko, David A.
Keim, Paul
author_facet Sahl, Jason W.
Caporaso, J. Gregory
Rasko, David A.
Keim, Paul
author_sort Sahl, Jason W.
collection PubMed
description Background. As whole genome sequence data from bacterial isolates becomes cheaper to generate, computational methods are needed to correlate sequence data with biological observations. Here we present the large-scale BLAST score ratio (LS-BSR) pipeline, which rapidly compares the genetic content of hundreds to thousands of bacterial genomes, and returns a matrix that describes the relatedness of all coding sequences (CDSs) in all genomes surveyed. This matrix can be easily parsed in order to identify genetic relationships between bacterial genomes. Although pipelines have been published that group peptides by sequence similarity, no other software performs the rapid, large-scale, full-genome comparative analyses carried out by LS-BSR. Results. To demonstrate the utility of the method, the LS-BSR pipeline was tested on 96 Escherichia coli and Shigella genomes; the pipeline ran in 163 min using 16 processors, which is a greater than 7-fold speedup compared to using a single processor. The BSR values for each CDS, which indicate a relative level of relatedness, were then mapped to each genome on an independent core genome single nucleotide polymorphism (SNP) based phylogeny. Comparisons were then used to identify clade specific CDS markers and validate the LS-BSR pipeline based on molecular markers that delineate between classical E. coli pathogenic variant (pathovar) designations. Scalability tests demonstrated that the LS-BSR pipeline can process 1,000 E. coli genomes in 27–57 h, depending upon the alignment method, using 16 processors. Conclusions. LS-BSR is an open-source, parallel implementation of the BSR algorithm, enabling rapid comparison of the genetic content of large numbers of genomes. The results of the pipeline can be used to identify specific markers between user-defined phylogenetic groups, and to identify the loss and/or acquisition of genetic information between bacterial isolates. Taxa-specific genetic markers can then be translated into clinical diagnostics, or can be used to identify broadly conserved putative therapeutic candidates.
format Online
Article
Text
id pubmed-3976120
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-39761202014-04-18 The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes Sahl, Jason W. Caporaso, J. Gregory Rasko, David A. Keim, Paul PeerJ Bioinformatics Background. As whole genome sequence data from bacterial isolates becomes cheaper to generate, computational methods are needed to correlate sequence data with biological observations. Here we present the large-scale BLAST score ratio (LS-BSR) pipeline, which rapidly compares the genetic content of hundreds to thousands of bacterial genomes, and returns a matrix that describes the relatedness of all coding sequences (CDSs) in all genomes surveyed. This matrix can be easily parsed in order to identify genetic relationships between bacterial genomes. Although pipelines have been published that group peptides by sequence similarity, no other software performs the rapid, large-scale, full-genome comparative analyses carried out by LS-BSR. Results. To demonstrate the utility of the method, the LS-BSR pipeline was tested on 96 Escherichia coli and Shigella genomes; the pipeline ran in 163 min using 16 processors, which is a greater than 7-fold speedup compared to using a single processor. The BSR values for each CDS, which indicate a relative level of relatedness, were then mapped to each genome on an independent core genome single nucleotide polymorphism (SNP) based phylogeny. Comparisons were then used to identify clade specific CDS markers and validate the LS-BSR pipeline based on molecular markers that delineate between classical E. coli pathogenic variant (pathovar) designations. Scalability tests demonstrated that the LS-BSR pipeline can process 1,000 E. coli genomes in 27–57 h, depending upon the alignment method, using 16 processors. Conclusions. LS-BSR is an open-source, parallel implementation of the BSR algorithm, enabling rapid comparison of the genetic content of large numbers of genomes. The results of the pipeline can be used to identify specific markers between user-defined phylogenetic groups, and to identify the loss and/or acquisition of genetic information between bacterial isolates. Taxa-specific genetic markers can then be translated into clinical diagnostics, or can be used to identify broadly conserved putative therapeutic candidates. PeerJ Inc. 2014-04-01 /pmc/articles/PMC3976120/ /pubmed/24749011 http://dx.doi.org/10.7717/peerj.332 Text en © 2014 Sahl et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Sahl, Jason W.
Caporaso, J. Gregory
Rasko, David A.
Keim, Paul
The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes
title The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes
title_full The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes
title_fullStr The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes
title_full_unstemmed The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes
title_short The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes
title_sort large-scale blast score ratio (ls-bsr) pipeline: a method to rapidly compare genetic content between bacterial genomes
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3976120/
https://www.ncbi.nlm.nih.gov/pubmed/24749011
http://dx.doi.org/10.7717/peerj.332
work_keys_str_mv AT sahljasonw thelargescaleblastscoreratiolsbsrpipelineamethodtorapidlycomparegeneticcontentbetweenbacterialgenomes
AT caporasojgregory thelargescaleblastscoreratiolsbsrpipelineamethodtorapidlycomparegeneticcontentbetweenbacterialgenomes
AT raskodavida thelargescaleblastscoreratiolsbsrpipelineamethodtorapidlycomparegeneticcontentbetweenbacterialgenomes
AT keimpaul thelargescaleblastscoreratiolsbsrpipelineamethodtorapidlycomparegeneticcontentbetweenbacterialgenomes
AT sahljasonw largescaleblastscoreratiolsbsrpipelineamethodtorapidlycomparegeneticcontentbetweenbacterialgenomes
AT caporasojgregory largescaleblastscoreratiolsbsrpipelineamethodtorapidlycomparegeneticcontentbetweenbacterialgenomes
AT raskodavida largescaleblastscoreratiolsbsrpipelineamethodtorapidlycomparegeneticcontentbetweenbacterialgenomes
AT keimpaul largescaleblastscoreratiolsbsrpipelineamethodtorapidlycomparegeneticcontentbetweenbacterialgenomes