Cargando…

BugMat and FindNeighbour: command line and server applications for investigating bacterial relatedness

BACKGROUND: Large scale bacterial sequencing has made the determination of genetic relationships within large sequence collections of bacterial genomes derived from the same microbial species an increasingly common task. Solutions to the problem have application to public health (for example, in the...

Descripción completa

Detalles Bibliográficos
Autores principales: Mazariegos-Canellas, Oriol, Do, Trien, Peto, Tim, Eyre, David W., Underwood, Anthony, Crook, Derrick, Wyllie, David H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5683244/
https://www.ncbi.nlm.nih.gov/pubmed/29132318
http://dx.doi.org/10.1186/s12859-017-1907-2
_version_ 1783278242940059648
author Mazariegos-Canellas, Oriol
Do, Trien
Peto, Tim
Eyre, David W.
Underwood, Anthony
Crook, Derrick
Wyllie, David H.
author_facet Mazariegos-Canellas, Oriol
Do, Trien
Peto, Tim
Eyre, David W.
Underwood, Anthony
Crook, Derrick
Wyllie, David H.
author_sort Mazariegos-Canellas, Oriol
collection PubMed
description BACKGROUND: Large scale bacterial sequencing has made the determination of genetic relationships within large sequence collections of bacterial genomes derived from the same microbial species an increasingly common task. Solutions to the problem have application to public health (for example, in the detection of possible disease transmission), and as part of divide-and-conquer strategies selecting groups of similar isolates for computationally intensive methods of phylogenetic inference using (for example) maximal likelihood methods. However, the generation and maintenance of distance matrices is computationally intensive, and rapid methods of doing so are needed to allow translation of microbial genomics into public health actions. RESULTS: We developed, tested and deployed three solutions. BugMat is a fast C++ application which generates one-off in-memory distance matrices. FindNeighbour and FindNeighbour2 are server-side applications which build, maintain, and persist either complete (for FindNeighbour) or sparse (for FindNeighbour2) distance matrices given a set of sequences. FindNeighbour and BugMat use a variation model to accelerate computation, while FindNeighbour2 uses reference-based compression. Performance metrics show scalability into tens of thousands of sequences, with options for scaling further. CONCLUSION: Three applications, each with distinct strengths and weaknesses, are available for distance-matrix based analysis of large bacterial collections. Deployed as part of the Public Health England solution for M. tuberculosis genomic processing, they will have wide applicability. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-017-1907-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5683244
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56832442017-11-20 BugMat and FindNeighbour: command line and server applications for investigating bacterial relatedness Mazariegos-Canellas, Oriol Do, Trien Peto, Tim Eyre, David W. Underwood, Anthony Crook, Derrick Wyllie, David H. BMC Bioinformatics Software BACKGROUND: Large scale bacterial sequencing has made the determination of genetic relationships within large sequence collections of bacterial genomes derived from the same microbial species an increasingly common task. Solutions to the problem have application to public health (for example, in the detection of possible disease transmission), and as part of divide-and-conquer strategies selecting groups of similar isolates for computationally intensive methods of phylogenetic inference using (for example) maximal likelihood methods. However, the generation and maintenance of distance matrices is computationally intensive, and rapid methods of doing so are needed to allow translation of microbial genomics into public health actions. RESULTS: We developed, tested and deployed three solutions. BugMat is a fast C++ application which generates one-off in-memory distance matrices. FindNeighbour and FindNeighbour2 are server-side applications which build, maintain, and persist either complete (for FindNeighbour) or sparse (for FindNeighbour2) distance matrices given a set of sequences. FindNeighbour and BugMat use a variation model to accelerate computation, while FindNeighbour2 uses reference-based compression. Performance metrics show scalability into tens of thousands of sequences, with options for scaling further. CONCLUSION: Three applications, each with distinct strengths and weaknesses, are available for distance-matrix based analysis of large bacterial collections. Deployed as part of the Public Health England solution for M. tuberculosis genomic processing, they will have wide applicability. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-017-1907-2) contains supplementary material, which is available to authorized users. BioMed Central 2017-11-13 /pmc/articles/PMC5683244/ /pubmed/29132318 http://dx.doi.org/10.1186/s12859-017-1907-2 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Mazariegos-Canellas, Oriol
Do, Trien
Peto, Tim
Eyre, David W.
Underwood, Anthony
Crook, Derrick
Wyllie, David H.
BugMat and FindNeighbour: command line and server applications for investigating bacterial relatedness
title BugMat and FindNeighbour: command line and server applications for investigating bacterial relatedness
title_full BugMat and FindNeighbour: command line and server applications for investigating bacterial relatedness
title_fullStr BugMat and FindNeighbour: command line and server applications for investigating bacterial relatedness
title_full_unstemmed BugMat and FindNeighbour: command line and server applications for investigating bacterial relatedness
title_short BugMat and FindNeighbour: command line and server applications for investigating bacterial relatedness
title_sort bugmat and findneighbour: command line and server applications for investigating bacterial relatedness
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5683244/
https://www.ncbi.nlm.nih.gov/pubmed/29132318
http://dx.doi.org/10.1186/s12859-017-1907-2
work_keys_str_mv AT mazariegoscanellasoriol bugmatandfindneighbourcommandlineandserverapplicationsforinvestigatingbacterialrelatedness
AT dotrien bugmatandfindneighbourcommandlineandserverapplicationsforinvestigatingbacterialrelatedness
AT petotim bugmatandfindneighbourcommandlineandserverapplicationsforinvestigatingbacterialrelatedness
AT eyredavidw bugmatandfindneighbourcommandlineandserverapplicationsforinvestigatingbacterialrelatedness
AT underwoodanthony bugmatandfindneighbourcommandlineandserverapplicationsforinvestigatingbacterialrelatedness
AT crookderrick bugmatandfindneighbourcommandlineandserverapplicationsforinvestigatingbacterialrelatedness
AT wylliedavidh bugmatandfindneighbourcommandlineandserverapplicationsforinvestigatingbacterialrelatedness