Cargando…

Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets

BACKGROUND: Collective animal behavior, such as the flocking of birds or the shoaling of fish, has inspired a class of algorithms designed to optimize distance-based clusters in various applications, including document analysis and DNA microarrays. In a flocking model, individual agents respond only...

Descripción completa

Detalles Bibliográficos
Autores principales:	Narechania, Apurva, Baker, Richard, DeSalle, Rob, Mathema, Barun, Kolokotronis, Sergios-Orestis, Kreiswirth, Barry, Planet, Paul J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5078944/ https://www.ncbi.nlm.nih.gov/pubmed/27776538 http://dx.doi.org/10.1186/s13742-016-0152-3

_version_	1782462484094910464
author	Narechania, Apurva Baker, Richard DeSalle, Rob Mathema, Barun Kolokotronis, Sergios-Orestis Kreiswirth, Barry Planet, Paul J.
author_facet	Narechania, Apurva Baker, Richard DeSalle, Rob Mathema, Barun Kolokotronis, Sergios-Orestis Kreiswirth, Barry Planet, Paul J.
author_sort	Narechania, Apurva
collection	PubMed
description	BACKGROUND: Collective animal behavior, such as the flocking of birds or the shoaling of fish, has inspired a class of algorithms designed to optimize distance-based clusters in various applications, including document analysis and DNA microarrays. In a flocking model, individual agents respond only to their immediate environment and move according to a few simple rules. After several iterations the agents self-organize, and clusters emerge without the need for partitional seeds. In addition to its unsupervised nature, flocking offers several computational advantages, including the potential to reduce the number of required comparisons. FINDINGS: In the tool presented here, Clusterflock, we have implemented a flocking algorithm designed to locate groups (flocks) of orthologous gene families (OGFs) that share an evolutionary history. Pairwise distances that measure phylogenetic incongruence between OGFs guide flock formation. We tested this approach on several simulated datasets by varying the number of underlying topologies, the proportion of missing data, and evolutionary rates, and show that in datasets containing high levels of missing data and rate heterogeneity, Clusterflock outperforms other well-established clustering techniques. We also verified its utility on a known, large-scale recombination event in Staphylococcus aureus. By isolating sets of OGFs with divergent phylogenetic signals, we were able to pinpoint the recombined region without forcing a pre-determined number of groupings or defining a pre-determined incongruence threshold. CONCLUSIONS: Clusterflock is an open-source tool that can be used to discover horizontally transferred genes, recombined areas of chromosomes, and the phylogenetic ‘core’ of a genome. Although we used it here in an evolutionary context, it is generalizable to any clustering problem. Users can write extensions to calculate any distance metric on the unit interval, and can use these distances to ‘flock’ any type of data.
format	Online Article Text
id	pubmed-5078944
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-50789442016-10-31 Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets Narechania, Apurva Baker, Richard DeSalle, Rob Mathema, Barun Kolokotronis, Sergios-Orestis Kreiswirth, Barry Planet, Paul J. Gigascience Technical Note BACKGROUND: Collective animal behavior, such as the flocking of birds or the shoaling of fish, has inspired a class of algorithms designed to optimize distance-based clusters in various applications, including document analysis and DNA microarrays. In a flocking model, individual agents respond only to their immediate environment and move according to a few simple rules. After several iterations the agents self-organize, and clusters emerge without the need for partitional seeds. In addition to its unsupervised nature, flocking offers several computational advantages, including the potential to reduce the number of required comparisons. FINDINGS: In the tool presented here, Clusterflock, we have implemented a flocking algorithm designed to locate groups (flocks) of orthologous gene families (OGFs) that share an evolutionary history. Pairwise distances that measure phylogenetic incongruence between OGFs guide flock formation. We tested this approach on several simulated datasets by varying the number of underlying topologies, the proportion of missing data, and evolutionary rates, and show that in datasets containing high levels of missing data and rate heterogeneity, Clusterflock outperforms other well-established clustering techniques. We also verified its utility on a known, large-scale recombination event in Staphylococcus aureus. By isolating sets of OGFs with divergent phylogenetic signals, we were able to pinpoint the recombined region without forcing a pre-determined number of groupings or defining a pre-determined incongruence threshold. CONCLUSIONS: Clusterflock is an open-source tool that can be used to discover horizontally transferred genes, recombined areas of chromosomes, and the phylogenetic ‘core’ of a genome. Although we used it here in an evolutionary context, it is generalizable to any clustering problem. Users can write extensions to calculate any distance metric on the unit interval, and can use these distances to ‘flock’ any type of data. BioMed Central 2016-10-24 /pmc/articles/PMC5078944/ /pubmed/27776538 http://dx.doi.org/10.1186/s13742-016-0152-3 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Technical Note Narechania, Apurva Baker, Richard DeSalle, Rob Mathema, Barun Kolokotronis, Sergios-Orestis Kreiswirth, Barry Planet, Paul J. Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets
title	Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets
title_full	Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets
title_fullStr	Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets
title_full_unstemmed	Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets
title_short	Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets
title_sort	clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets
topic	Technical Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5078944/ https://www.ncbi.nlm.nih.gov/pubmed/27776538 http://dx.doi.org/10.1186/s13742-016-0152-3
work_keys_str_mv	AT narechaniaapurva clusterflockaflockingalgorithmforisolatingcongruentphylogenomicdatasets AT bakerrichard clusterflockaflockingalgorithmforisolatingcongruentphylogenomicdatasets AT desallerob clusterflockaflockingalgorithmforisolatingcongruentphylogenomicdatasets AT mathemabarun clusterflockaflockingalgorithmforisolatingcongruentphylogenomicdatasets AT kolokotronissergiosorestis clusterflockaflockingalgorithmforisolatingcongruentphylogenomicdatasets AT kreiswirthbarry clusterflockaflockingalgorithmforisolatingcongruentphylogenomicdatasets AT planetpaulj clusterflockaflockingalgorithmforisolatingcongruentphylogenomicdatasets

Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets

Ejemplares similares