Cargando…

Stratification of co-evolving genomic groups using ranked phylogenetic profiles

BACKGROUND: Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes...

Descripción completa

Detalles Bibliográficos
Autores principales: Freilich, Shiri, Goldovsky, Leon, Gottlieb, Assaf, Blanc, Eric, Tsoka, Sophia, Ouzounis, Christos A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2775751/
https://www.ncbi.nlm.nih.gov/pubmed/19860884
http://dx.doi.org/10.1186/1471-2105-10-355
_version_ 1782174021460164608
author Freilich, Shiri
Goldovsky, Leon
Gottlieb, Assaf
Blanc, Eric
Tsoka, Sophia
Ouzounis, Christos A
author_facet Freilich, Shiri
Goldovsky, Leon
Gottlieb, Assaf
Blanc, Eric
Tsoka, Sophia
Ouzounis, Christos A
author_sort Freilich, Shiri
collection PubMed
description BACKGROUND: Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. RESULTS: The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. CONCLUSION: Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples.
format Text
id pubmed-2775751
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27757512009-11-11 Stratification of co-evolving genomic groups using ranked phylogenetic profiles Freilich, Shiri Goldovsky, Leon Gottlieb, Assaf Blanc, Eric Tsoka, Sophia Ouzounis, Christos A BMC Bioinformatics Methodology Article BACKGROUND: Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. RESULTS: The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. CONCLUSION: Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples. BioMed Central 2009-10-27 /pmc/articles/PMC2775751/ /pubmed/19860884 http://dx.doi.org/10.1186/1471-2105-10-355 Text en Copyright © 2009 Freilich et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Freilich, Shiri
Goldovsky, Leon
Gottlieb, Assaf
Blanc, Eric
Tsoka, Sophia
Ouzounis, Christos A
Stratification of co-evolving genomic groups using ranked phylogenetic profiles
title Stratification of co-evolving genomic groups using ranked phylogenetic profiles
title_full Stratification of co-evolving genomic groups using ranked phylogenetic profiles
title_fullStr Stratification of co-evolving genomic groups using ranked phylogenetic profiles
title_full_unstemmed Stratification of co-evolving genomic groups using ranked phylogenetic profiles
title_short Stratification of co-evolving genomic groups using ranked phylogenetic profiles
title_sort stratification of co-evolving genomic groups using ranked phylogenetic profiles
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2775751/
https://www.ncbi.nlm.nih.gov/pubmed/19860884
http://dx.doi.org/10.1186/1471-2105-10-355
work_keys_str_mv AT freilichshiri stratificationofcoevolvinggenomicgroupsusingrankedphylogeneticprofiles
AT goldovskyleon stratificationofcoevolvinggenomicgroupsusingrankedphylogeneticprofiles
AT gottliebassaf stratificationofcoevolvinggenomicgroupsusingrankedphylogeneticprofiles
AT blanceric stratificationofcoevolvinggenomicgroupsusingrankedphylogeneticprofiles
AT tsokasophia stratificationofcoevolvinggenomicgroupsusingrankedphylogeneticprofiles
AT ouzounischristosa stratificationofcoevolvinggenomicgroupsusingrankedphylogeneticprofiles