Cargando…

Global optimal eBURST analysis of multilocus typing data using a graphic matroid approach

BACKGROUND: Multilocus Sequence Typing (MLST) is a frequently used typing method for the analysis of the clonal relationships among strains of several clinically relevant microbial species. MLST is based on the sequence of housekeeping genes that result in each strain having a distinct numerical all...

Descripción completa

Detalles Bibliográficos
Autores principales: Francisco, Alexandre P, Bugalho, Miguel, Ramirez, Mário, Carriço, João A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2705362/
https://www.ncbi.nlm.nih.gov/pubmed/19450271
http://dx.doi.org/10.1186/1471-2105-10-152
_version_ 1782168982788243456
author Francisco, Alexandre P
Bugalho, Miguel
Ramirez, Mário
Carriço, João A
author_facet Francisco, Alexandre P
Bugalho, Miguel
Ramirez, Mário
Carriço, João A
author_sort Francisco, Alexandre P
collection PubMed
description BACKGROUND: Multilocus Sequence Typing (MLST) is a frequently used typing method for the analysis of the clonal relationships among strains of several clinically relevant microbial species. MLST is based on the sequence of housekeeping genes that result in each strain having a distinct numerical allelic profile, which is abbreviated to a unique identifier: the sequence type (ST). The relatedness between two strains can then be inferred by the differences between allelic profiles. For a more comprehensive analysis of the possible patterns of evolutionary descent, a set of rules were proposed and implemented in the eBURST algorithm. These rules allow the division of a data set into several clusters of related strains, dubbed clonal complexes, by implementing a simple model of clonal expansion and diversification. Within each clonal complex, the rules identify which links between STs correspond to the most probable pattern of descent. However, the eBURST algorithm is not globally optimized, which can result in links, within the clonal complexes, that violate the rules proposed. RESULTS: Here, we present a globally optimized implementation of the eBURST algorithm – goeBURST. The search for a global optimal solution led to the formalization of the problem as a graphic matroid, for which greedy algorithms that provide an optimal solution exist. Several public data sets of MLST data were tested and differences between the two implementations were found and are discussed for five bacterial species: Enterococcus faecium, Streptococcus pneumoniae, Burkholderia pseudomallei, Campylobacter jejuni and Neisseria spp.. A novel feature implemented in goeBURST is the representation of the level of tiebreak rule reached before deciding if a link should be drawn, which can used to visually evaluate the reliability of the represented hypothetical pattern of descent. CONCLUSION: goeBURST is a globally optimized implementation of the eBURST algorithm, that identifies alternative patterns of descent for several bacterial species. Furthermore, the algorithm can be applied to any multilocus typing data based on the number of differences between numeric profiles. A software implementation is available at .
format Text
id pubmed-2705362
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27053622009-07-03 Global optimal eBURST analysis of multilocus typing data using a graphic matroid approach Francisco, Alexandre P Bugalho, Miguel Ramirez, Mário Carriço, João A BMC Bioinformatics Methodology Article BACKGROUND: Multilocus Sequence Typing (MLST) is a frequently used typing method for the analysis of the clonal relationships among strains of several clinically relevant microbial species. MLST is based on the sequence of housekeeping genes that result in each strain having a distinct numerical allelic profile, which is abbreviated to a unique identifier: the sequence type (ST). The relatedness between two strains can then be inferred by the differences between allelic profiles. For a more comprehensive analysis of the possible patterns of evolutionary descent, a set of rules were proposed and implemented in the eBURST algorithm. These rules allow the division of a data set into several clusters of related strains, dubbed clonal complexes, by implementing a simple model of clonal expansion and diversification. Within each clonal complex, the rules identify which links between STs correspond to the most probable pattern of descent. However, the eBURST algorithm is not globally optimized, which can result in links, within the clonal complexes, that violate the rules proposed. RESULTS: Here, we present a globally optimized implementation of the eBURST algorithm – goeBURST. The search for a global optimal solution led to the formalization of the problem as a graphic matroid, for which greedy algorithms that provide an optimal solution exist. Several public data sets of MLST data were tested and differences between the two implementations were found and are discussed for five bacterial species: Enterococcus faecium, Streptococcus pneumoniae, Burkholderia pseudomallei, Campylobacter jejuni and Neisseria spp.. A novel feature implemented in goeBURST is the representation of the level of tiebreak rule reached before deciding if a link should be drawn, which can used to visually evaluate the reliability of the represented hypothetical pattern of descent. CONCLUSION: goeBURST is a globally optimized implementation of the eBURST algorithm, that identifies alternative patterns of descent for several bacterial species. Furthermore, the algorithm can be applied to any multilocus typing data based on the number of differences between numeric profiles. A software implementation is available at . BioMed Central 2009-05-18 /pmc/articles/PMC2705362/ /pubmed/19450271 http://dx.doi.org/10.1186/1471-2105-10-152 Text en Copyright © 2009 Francisco et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Francisco, Alexandre P
Bugalho, Miguel
Ramirez, Mário
Carriço, João A
Global optimal eBURST analysis of multilocus typing data using a graphic matroid approach
title Global optimal eBURST analysis of multilocus typing data using a graphic matroid approach
title_full Global optimal eBURST analysis of multilocus typing data using a graphic matroid approach
title_fullStr Global optimal eBURST analysis of multilocus typing data using a graphic matroid approach
title_full_unstemmed Global optimal eBURST analysis of multilocus typing data using a graphic matroid approach
title_short Global optimal eBURST analysis of multilocus typing data using a graphic matroid approach
title_sort global optimal eburst analysis of multilocus typing data using a graphic matroid approach
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2705362/
https://www.ncbi.nlm.nih.gov/pubmed/19450271
http://dx.doi.org/10.1186/1471-2105-10-152
work_keys_str_mv AT franciscoalexandrep globaloptimaleburstanalysisofmultilocustypingdatausingagraphicmatroidapproach
AT bugalhomiguel globaloptimaleburstanalysisofmultilocustypingdatausingagraphicmatroidapproach
AT ramirezmario globaloptimaleburstanalysisofmultilocustypingdatausingagraphicmatroidapproach
AT carricojoaoa globaloptimaleburstanalysisofmultilocustypingdatausingagraphicmatroidapproach