Cargando…

A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches

Motivation: Identifying orthologous genes in multiple genomes is a fundamental task in comparative genomics. Construction of intergenomic symmetrical best matches (SymBets) and joining them into clusters is a popular method of ortholog definition, embodied in several software programs. Despite their...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kristensen, David M., Kannan, Lavanya, Coleman, Michael K., Wolf, Yuri I., Sorokin, Alexander, Koonin, Eugene V., Mushegian, Arcady
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2010
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881409/ https://www.ncbi.nlm.nih.gov/pubmed/20439257 http://dx.doi.org/10.1093/bioinformatics/btq229

_version_	1782182117572083712
author	Kristensen, David M. Kannan, Lavanya Coleman, Michael K. Wolf, Yuri I. Sorokin, Alexander Koonin, Eugene V. Mushegian, Arcady
author_facet	Kristensen, David M. Kannan, Lavanya Coleman, Michael K. Wolf, Yuri I. Sorokin, Alexander Koonin, Eugene V. Mushegian, Arcady
author_sort	Kristensen, David M.
collection	PubMed
description	Motivation: Identifying orthologous genes in multiple genomes is a fundamental task in comparative genomics. Construction of intergenomic symmetrical best matches (SymBets) and joining them into clusters is a popular method of ortholog definition, embodied in several software programs. Despite their wide use, the computational complexity of these programs has not been thoroughly examined. Results: In this work, we show that in the standard approach of iteration through all triangles of SymBets, the memory scales with at least the number of these triangles, O(g(3)) (where g = number of genomes), and construction time scales with the iteration through each pair, i.e. O(g(6)). We propose the EdgeSearch algorithm that iterates over edges in the SymBet graph rather than triangles of SymBets, and as a result has a worst-case complexity of only O(g(3)log g). Several optimizations reduce the run-time even further in realistically sparse graphs. In two real-world datasets of genomes from bacteriophages (POGs) and Mollicutes (MOGs), an implementation of the EdgeSearch algorithm runs about an order of magnitude faster than the original algorithm and scales much better with increasing number of genomes, with only minor differences in the final results, and up to 60 times faster than the popular OrthoMCL program with a 90% overlap between the identified groups of orthologs. Availability and implementation: C++ source code freely available for download at ftp.ncbi.nih.gov/pub/wolf/COGs/COGsoft/ Contact: dmk@stowers.org Supplementary information: Supplementary materials are available at Bioinformatics online.
format	Text
id	pubmed-2881409
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-28814092010-06-08 A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches Kristensen, David M. Kannan, Lavanya Coleman, Michael K. Wolf, Yuri I. Sorokin, Alexander Koonin, Eugene V. Mushegian, Arcady Bioinformatics Original Papers Motivation: Identifying orthologous genes in multiple genomes is a fundamental task in comparative genomics. Construction of intergenomic symmetrical best matches (SymBets) and joining them into clusters is a popular method of ortholog definition, embodied in several software programs. Despite their wide use, the computational complexity of these programs has not been thoroughly examined. Results: In this work, we show that in the standard approach of iteration through all triangles of SymBets, the memory scales with at least the number of these triangles, O(g(3)) (where g = number of genomes), and construction time scales with the iteration through each pair, i.e. O(g(6)). We propose the EdgeSearch algorithm that iterates over edges in the SymBet graph rather than triangles of SymBets, and as a result has a worst-case complexity of only O(g(3)log g). Several optimizations reduce the run-time even further in realistically sparse graphs. In two real-world datasets of genomes from bacteriophages (POGs) and Mollicutes (MOGs), an implementation of the EdgeSearch algorithm runs about an order of magnitude faster than the original algorithm and scales much better with increasing number of genomes, with only minor differences in the final results, and up to 60 times faster than the popular OrthoMCL program with a 90% overlap between the identified groups of orthologs. Availability and implementation: C++ source code freely available for download at ftp.ncbi.nih.gov/pub/wolf/COGs/COGsoft/ Contact: dmk@stowers.org Supplementary information: Supplementary materials are available at Bioinformatics online. Oxford University Press 2010-06-15 2010-05-02 /pmc/articles/PMC2881409/ /pubmed/20439257 http://dx.doi.org/10.1093/bioinformatics/btq229 Text en © The Author 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Kristensen, David M. Kannan, Lavanya Coleman, Michael K. Wolf, Yuri I. Sorokin, Alexander Koonin, Eugene V. Mushegian, Arcady A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches
title	A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches
title_full	A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches
title_fullStr	A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches
title_full_unstemmed	A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches
title_short	A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches
title_sort	low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881409/ https://www.ncbi.nlm.nih.gov/pubmed/20439257 http://dx.doi.org/10.1093/bioinformatics/btq229
work_keys_str_mv	AT kristensendavidm alowpolynomialalgorithmforassemblingclustersoforthologousgroupsfromintergenomicsymmetricbestmatches AT kannanlavanya alowpolynomialalgorithmforassemblingclustersoforthologousgroupsfromintergenomicsymmetricbestmatches AT colemanmichaelk alowpolynomialalgorithmforassemblingclustersoforthologousgroupsfromintergenomicsymmetricbestmatches AT wolfyurii alowpolynomialalgorithmforassemblingclustersoforthologousgroupsfromintergenomicsymmetricbestmatches AT sorokinalexander alowpolynomialalgorithmforassemblingclustersoforthologousgroupsfromintergenomicsymmetricbestmatches AT koonineugenev alowpolynomialalgorithmforassemblingclustersoforthologousgroupsfromintergenomicsymmetricbestmatches AT mushegianarcady alowpolynomialalgorithmforassemblingclustersoforthologousgroupsfromintergenomicsymmetricbestmatches AT kristensendavidm lowpolynomialalgorithmforassemblingclustersoforthologousgroupsfromintergenomicsymmetricbestmatches AT kannanlavanya lowpolynomialalgorithmforassemblingclustersoforthologousgroupsfromintergenomicsymmetricbestmatches AT colemanmichaelk lowpolynomialalgorithmforassemblingclustersoforthologousgroupsfromintergenomicsymmetricbestmatches AT wolfyurii lowpolynomialalgorithmforassemblingclustersoforthologousgroupsfromintergenomicsymmetricbestmatches AT sorokinalexander lowpolynomialalgorithmforassemblingclustersoforthologousgroupsfromintergenomicsymmetricbestmatches AT koonineugenev lowpolynomialalgorithmforassemblingclustersoforthologousgroupsfromintergenomicsymmetricbestmatches AT mushegianarcady lowpolynomialalgorithmforassemblingclustersoforthologousgroupsfromintergenomicsymmetricbestmatches

A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches

Ejemplares similares