Cargando…

Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes

Ortholog identification is a crucial first step in comparative genomics. Here, we present a rapid method of ortholog grouping which is effective enough to allow the comparison of many genomes simultaneously. The method takes as input all-against-all similarity data and classifies genes based on the...

Descripción completa

Detalles Bibliográficos
Autor principal: Uchiyama, Ikuo
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1351371/
https://www.ncbi.nlm.nih.gov/pubmed/16436801
http://dx.doi.org/10.1093/nar/gkj448
_version_ 1782126659285024768
author Uchiyama, Ikuo
author_facet Uchiyama, Ikuo
author_sort Uchiyama, Ikuo
collection PubMed
description Ortholog identification is a crucial first step in comparative genomics. Here, we present a rapid method of ortholog grouping which is effective enough to allow the comparison of many genomes simultaneously. The method takes as input all-against-all similarity data and classifies genes based on the traditional hierarchical clustering algorithm UPGMA. In the course of clustering, the method detects domain fusion or fission events, and splits clusters into domains if required. The subsequent procedure splits the resulting trees such that intra-species paralogous genes are divided into different groups so as to create plausible orthologous groups. As a result, the procedure can split genes into the domains minimally required for ortholog grouping. The procedure, named DomClust, was tested using the COG database as a reference. When comparing several clustering algorithms combined with the conventional bidirectional best-hit (BBH) criterion, we found that our method generally showed better agreement with the COG classification. By comparing the clustering results generated from datasets of different releases, we also found that our method showed relatively good stability in comparison to the BBH-based methods.
format Text
id pubmed-1351371
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-13513712006-02-03 Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes Uchiyama, Ikuo Nucleic Acids Res Article Ortholog identification is a crucial first step in comparative genomics. Here, we present a rapid method of ortholog grouping which is effective enough to allow the comparison of many genomes simultaneously. The method takes as input all-against-all similarity data and classifies genes based on the traditional hierarchical clustering algorithm UPGMA. In the course of clustering, the method detects domain fusion or fission events, and splits clusters into domains if required. The subsequent procedure splits the resulting trees such that intra-species paralogous genes are divided into different groups so as to create plausible orthologous groups. As a result, the procedure can split genes into the domains minimally required for ortholog grouping. The procedure, named DomClust, was tested using the COG database as a reference. When comparing several clustering algorithms combined with the conventional bidirectional best-hit (BBH) criterion, we found that our method generally showed better agreement with the COG classification. By comparing the clustering results generated from datasets of different releases, we also found that our method showed relatively good stability in comparison to the BBH-based methods. Oxford University Press 2006 2006-01-25 /pmc/articles/PMC1351371/ /pubmed/16436801 http://dx.doi.org/10.1093/nar/gkj448 Text en © The Author 2006. Published by Oxford University Press. All rights reserved
spellingShingle Article
Uchiyama, Ikuo
Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes
title Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes
title_full Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes
title_fullStr Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes
title_full_unstemmed Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes
title_short Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes
title_sort hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1351371/
https://www.ncbi.nlm.nih.gov/pubmed/16436801
http://dx.doi.org/10.1093/nar/gkj448
work_keys_str_mv AT uchiyamaikuo hierarchicalclusteringalgorithmforcomprehensiveorthologousdomainclassificationinmultiplegenomes