Cargando…

Improvement of domain-level ortholog clustering by optimizing domain-specific sum-of-pairs score

BACKGROUND: Identification of ortholog groups is a crucial step in comparative analysis of multiple genomes. Although several computational methods have been developed to create ortholog groups, most of those methods do not evaluate orthology at the sub-gene level. In our method for domain-level ort...

Descripción completa

Detalles Bibliográficos
Autores principales: Chiba, Hirokazu, Uchiyama, Ikuo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4035852/
https://www.ncbi.nlm.nih.gov/pubmed/24885064
http://dx.doi.org/10.1186/1471-2105-15-148
_version_ 1782318113870577664
author Chiba, Hirokazu
Uchiyama, Ikuo
author_facet Chiba, Hirokazu
Uchiyama, Ikuo
author_sort Chiba, Hirokazu
collection PubMed
description BACKGROUND: Identification of ortholog groups is a crucial step in comparative analysis of multiple genomes. Although several computational methods have been developed to create ortholog groups, most of those methods do not evaluate orthology at the sub-gene level. In our method for domain-level ortholog clustering, DomClust, proteins are split into domains on the basis of alignment boundaries identified by all-against-all pairwise comparison, but it often fails to determine appropriate boundaries. RESULTS: We developed a method to improve domain-level ortholog classification using multiple alignment information. This method is based on a scoring scheme, the domain-specific sum-of-pairs (DSP) score, which evaluates ortholog clustering results at the domain level as the sum total of domain-level alignment scores. We developed a refinement pipeline to improve domain-level clustering, DomRefine, by optimizing the DSP score. We applied DomRefine to domain-level ortholog groups created by DomClust using a dataset obtained from the Microbial Genome Database for Comparative Analysis (MBGD), and evaluated the results using COG clusters and TIGRFAMs models as the reference data. Thus, we observed that the agreement between the resulting classification and the classifications in the reference databases is improved at almost every step in the refinement pipeline. Moreover, the refined classification showed better agreement than the classifications in the eggNOG databases when TIGRFAMs was used as the reference database. CONCLUSIONS: DomRefine is a useful tool for improving the quality of domain-level ortholog classification among microbial genomes. Combining with a rapid domain-level ortholog clustering method, such as DomClust, it can be used to create a high-quality ortholog database that can serve as a solid basis for various comparative genome analyses.
format Online
Article
Text
id pubmed-4035852
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40358522014-06-11 Improvement of domain-level ortholog clustering by optimizing domain-specific sum-of-pairs score Chiba, Hirokazu Uchiyama, Ikuo BMC Bioinformatics Research Article BACKGROUND: Identification of ortholog groups is a crucial step in comparative analysis of multiple genomes. Although several computational methods have been developed to create ortholog groups, most of those methods do not evaluate orthology at the sub-gene level. In our method for domain-level ortholog clustering, DomClust, proteins are split into domains on the basis of alignment boundaries identified by all-against-all pairwise comparison, but it often fails to determine appropriate boundaries. RESULTS: We developed a method to improve domain-level ortholog classification using multiple alignment information. This method is based on a scoring scheme, the domain-specific sum-of-pairs (DSP) score, which evaluates ortholog clustering results at the domain level as the sum total of domain-level alignment scores. We developed a refinement pipeline to improve domain-level clustering, DomRefine, by optimizing the DSP score. We applied DomRefine to domain-level ortholog groups created by DomClust using a dataset obtained from the Microbial Genome Database for Comparative Analysis (MBGD), and evaluated the results using COG clusters and TIGRFAMs models as the reference data. Thus, we observed that the agreement between the resulting classification and the classifications in the reference databases is improved at almost every step in the refinement pipeline. Moreover, the refined classification showed better agreement than the classifications in the eggNOG databases when TIGRFAMs was used as the reference database. CONCLUSIONS: DomRefine is a useful tool for improving the quality of domain-level ortholog classification among microbial genomes. Combining with a rapid domain-level ortholog clustering method, such as DomClust, it can be used to create a high-quality ortholog database that can serve as a solid basis for various comparative genome analyses. BioMed Central 2014-05-18 /pmc/articles/PMC4035852/ /pubmed/24885064 http://dx.doi.org/10.1186/1471-2105-15-148 Text en Copyright © 2014 Chiba and Uchiyama; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Chiba, Hirokazu
Uchiyama, Ikuo
Improvement of domain-level ortholog clustering by optimizing domain-specific sum-of-pairs score
title Improvement of domain-level ortholog clustering by optimizing domain-specific sum-of-pairs score
title_full Improvement of domain-level ortholog clustering by optimizing domain-specific sum-of-pairs score
title_fullStr Improvement of domain-level ortholog clustering by optimizing domain-specific sum-of-pairs score
title_full_unstemmed Improvement of domain-level ortholog clustering by optimizing domain-specific sum-of-pairs score
title_short Improvement of domain-level ortholog clustering by optimizing domain-specific sum-of-pairs score
title_sort improvement of domain-level ortholog clustering by optimizing domain-specific sum-of-pairs score
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4035852/
https://www.ncbi.nlm.nih.gov/pubmed/24885064
http://dx.doi.org/10.1186/1471-2105-15-148
work_keys_str_mv AT chibahirokazu improvementofdomainlevelorthologclusteringbyoptimizingdomainspecificsumofpairsscore
AT uchiyamaikuo improvementofdomainlevelorthologclusteringbyoptimizingdomainspecificsumofpairsscore