Cargando…
Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes
Ortholog identification is a crucial first step in comparative genomics. Here, we present a rapid method of ortholog grouping which is effective enough to allow the comparison of many genomes simultaneously. The method takes as input all-against-all similarity data and classifies genes based on the...
Autor principal: | |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1351371/ https://www.ncbi.nlm.nih.gov/pubmed/16436801 http://dx.doi.org/10.1093/nar/gkj448 |
_version_ | 1782126659285024768 |
---|---|
author | Uchiyama, Ikuo |
author_facet | Uchiyama, Ikuo |
author_sort | Uchiyama, Ikuo |
collection | PubMed |
description | Ortholog identification is a crucial first step in comparative genomics. Here, we present a rapid method of ortholog grouping which is effective enough to allow the comparison of many genomes simultaneously. The method takes as input all-against-all similarity data and classifies genes based on the traditional hierarchical clustering algorithm UPGMA. In the course of clustering, the method detects domain fusion or fission events, and splits clusters into domains if required. The subsequent procedure splits the resulting trees such that intra-species paralogous genes are divided into different groups so as to create plausible orthologous groups. As a result, the procedure can split genes into the domains minimally required for ortholog grouping. The procedure, named DomClust, was tested using the COG database as a reference. When comparing several clustering algorithms combined with the conventional bidirectional best-hit (BBH) criterion, we found that our method generally showed better agreement with the COG classification. By comparing the clustering results generated from datasets of different releases, we also found that our method showed relatively good stability in comparison to the BBH-based methods. |
format | Text |
id | pubmed-1351371 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-13513712006-02-03 Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes Uchiyama, Ikuo Nucleic Acids Res Article Ortholog identification is a crucial first step in comparative genomics. Here, we present a rapid method of ortholog grouping which is effective enough to allow the comparison of many genomes simultaneously. The method takes as input all-against-all similarity data and classifies genes based on the traditional hierarchical clustering algorithm UPGMA. In the course of clustering, the method detects domain fusion or fission events, and splits clusters into domains if required. The subsequent procedure splits the resulting trees such that intra-species paralogous genes are divided into different groups so as to create plausible orthologous groups. As a result, the procedure can split genes into the domains minimally required for ortholog grouping. The procedure, named DomClust, was tested using the COG database as a reference. When comparing several clustering algorithms combined with the conventional bidirectional best-hit (BBH) criterion, we found that our method generally showed better agreement with the COG classification. By comparing the clustering results generated from datasets of different releases, we also found that our method showed relatively good stability in comparison to the BBH-based methods. Oxford University Press 2006 2006-01-25 /pmc/articles/PMC1351371/ /pubmed/16436801 http://dx.doi.org/10.1093/nar/gkj448 Text en © The Author 2006. Published by Oxford University Press. All rights reserved |
spellingShingle | Article Uchiyama, Ikuo Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes |
title | Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes |
title_full | Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes |
title_fullStr | Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes |
title_full_unstemmed | Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes |
title_short | Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes |
title_sort | hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1351371/ https://www.ncbi.nlm.nih.gov/pubmed/16436801 http://dx.doi.org/10.1093/nar/gkj448 |
work_keys_str_mv | AT uchiyamaikuo hierarchicalclusteringalgorithmforcomprehensiveorthologousdomainclassificationinmultiplegenomes |