Cargando…

GIGA: a simple, efficient algorithm for gene tree inference in the genomic age

BACKGROUND: Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms...

Descripción completa

Detalles Bibliográficos
Autor principal:	Thomas, Paul D
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2905364/ https://www.ncbi.nlm.nih.gov/pubmed/20534164 http://dx.doi.org/10.1186/1471-2105-11-312

_version_	1782183954846056448
author	Thomas, Paul D
author_facet	Thomas, Paul D
author_sort	Thomas, Paul D
collection	PubMed
description	BACKGROUND: Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. RESULTS: We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. CONCLUSIONS: GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events.
format	Text
id	pubmed-2905364
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-29053642010-07-17 GIGA: a simple, efficient algorithm for gene tree inference in the genomic age Thomas, Paul D BMC Bioinformatics Methodology Article BACKGROUND: Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. RESULTS: We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. CONCLUSIONS: GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events. BioMed Central 2010-06-09 /pmc/articles/PMC2905364/ /pubmed/20534164 http://dx.doi.org/10.1186/1471-2105-11-312 Text en Copyright ©2010 Thomas; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Thomas, Paul D GIGA: a simple, efficient algorithm for gene tree inference in the genomic age
title	GIGA: a simple, efficient algorithm for gene tree inference in the genomic age
title_full	GIGA: a simple, efficient algorithm for gene tree inference in the genomic age
title_fullStr	GIGA: a simple, efficient algorithm for gene tree inference in the genomic age
title_full_unstemmed	GIGA: a simple, efficient algorithm for gene tree inference in the genomic age
title_short	GIGA: a simple, efficient algorithm for gene tree inference in the genomic age
title_sort	giga: a simple, efficient algorithm for gene tree inference in the genomic age
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2905364/ https://www.ncbi.nlm.nih.gov/pubmed/20534164 http://dx.doi.org/10.1186/1471-2105-11-312
work_keys_str_mv	AT thomaspauld gigaasimpleefficientalgorithmforgenetreeinferenceinthegenomicage

GIGA: a simple, efficient algorithm for gene tree inference in the genomic age

Ejemplares similares