Cargando…

MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement

BACKGROUND: Ortholog assignment is a critical and fundamental problem in comparative genomics, since orthologs are considered to be functional counterparts in different species and can be used to infer molecular functions of one species from those of other species. MSOAR is a recently developed high...

Descripción completa

Detalles Bibliográficos
Autores principales: Shi, Guanqun, Zhang, Liqing, Jiang, Tao
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2821317/
https://www.ncbi.nlm.nih.gov/pubmed/20053291
http://dx.doi.org/10.1186/1471-2105-11-10
_version_ 1782177424671244288
author Shi, Guanqun
Zhang, Liqing
Jiang, Tao
author_facet Shi, Guanqun
Zhang, Liqing
Jiang, Tao
author_sort Shi, Guanqun
collection PubMed
description BACKGROUND: Ortholog assignment is a critical and fundamental problem in comparative genomics, since orthologs are considered to be functional counterparts in different species and can be used to infer molecular functions of one species from those of other species. MSOAR is a recently developed high-throughput system for assigning one-to-one orthologs between closely related species on a genome scale. It attempts to reconstruct the evolutionary history of input genomes in terms of genome rearrangement and gene duplication events. It assumes that a gene duplication event inserts a duplicated gene into the genome of interest at a random location (i.e., the random duplication model). However, in practice, biologists believe that genes are often duplicated by tandem duplications, where a duplicated gene is located next to the original copy (i.e., the tandem duplication model). RESULTS: In this paper, we develop MSOAR 2.0, an improved system for one-to-one ortholog assignment. For a pair of input genomes, the system first focuses on the tandemly duplicated genes of each genome and tries to identify among them those that were duplicated after the speciation (i.e., the so-called inparalogs), using a simple phylogenetic tree reconciliation method. For each such set of tandemly duplicated inparalogs, all but one gene will be deleted from the concerned genome (because they cannot possibly appear in any one-to-one ortholog pairs), and MSOAR is invoked. Using both simulated and real data experiments, we show that MSOAR 2.0 is able to achieve a better sensitivity and specificity than MSOAR. In comparison with the well-known genome-scale ortholog assignment tool InParanoid, Ensembl ortholog database, and the orthology information extracted from the well-known whole-genome multiple alignment program MultiZ, MSOAR 2.0 shows the highest sensitivity. Although the specificity of MSOAR 2.0 is slightly worse than that of InParanoid in the real data experiments, it is actually better than that of InParanoid in the simulation tests. CONCLUSIONS: Our preliminary experimental results demonstrate that MSOAR 2.0 is a highly accurate tool for one-to-one ortholog assignment between closely related genomes. The software is available to the public for free and included as online supplementary material.
format Text
id pubmed-2821317
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28213172010-02-15 MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement Shi, Guanqun Zhang, Liqing Jiang, Tao BMC Bioinformatics Methodology article BACKGROUND: Ortholog assignment is a critical and fundamental problem in comparative genomics, since orthologs are considered to be functional counterparts in different species and can be used to infer molecular functions of one species from those of other species. MSOAR is a recently developed high-throughput system for assigning one-to-one orthologs between closely related species on a genome scale. It attempts to reconstruct the evolutionary history of input genomes in terms of genome rearrangement and gene duplication events. It assumes that a gene duplication event inserts a duplicated gene into the genome of interest at a random location (i.e., the random duplication model). However, in practice, biologists believe that genes are often duplicated by tandem duplications, where a duplicated gene is located next to the original copy (i.e., the tandem duplication model). RESULTS: In this paper, we develop MSOAR 2.0, an improved system for one-to-one ortholog assignment. For a pair of input genomes, the system first focuses on the tandemly duplicated genes of each genome and tries to identify among them those that were duplicated after the speciation (i.e., the so-called inparalogs), using a simple phylogenetic tree reconciliation method. For each such set of tandemly duplicated inparalogs, all but one gene will be deleted from the concerned genome (because they cannot possibly appear in any one-to-one ortholog pairs), and MSOAR is invoked. Using both simulated and real data experiments, we show that MSOAR 2.0 is able to achieve a better sensitivity and specificity than MSOAR. In comparison with the well-known genome-scale ortholog assignment tool InParanoid, Ensembl ortholog database, and the orthology information extracted from the well-known whole-genome multiple alignment program MultiZ, MSOAR 2.0 shows the highest sensitivity. Although the specificity of MSOAR 2.0 is slightly worse than that of InParanoid in the real data experiments, it is actually better than that of InParanoid in the simulation tests. CONCLUSIONS: Our preliminary experimental results demonstrate that MSOAR 2.0 is a highly accurate tool for one-to-one ortholog assignment between closely related genomes. The software is available to the public for free and included as online supplementary material. BioMed Central 2010-01-06 /pmc/articles/PMC2821317/ /pubmed/20053291 http://dx.doi.org/10.1186/1471-2105-11-10 Text en Copyright ©2010 Shi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology article
Shi, Guanqun
Zhang, Liqing
Jiang, Tao
MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement
title MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement
title_full MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement
title_fullStr MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement
title_full_unstemmed MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement
title_short MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement
title_sort msoar 2.0: incorporating tandem duplications into ortholog assignment based on genome rearrangement
topic Methodology article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2821317/
https://www.ncbi.nlm.nih.gov/pubmed/20053291
http://dx.doi.org/10.1186/1471-2105-11-10
work_keys_str_mv AT shiguanqun msoar20incorporatingtandemduplicationsintoorthologassignmentbasedongenomerearrangement
AT zhangliqing msoar20incorporatingtandemduplicationsintoorthologassignmentbasedongenomerearrangement
AT jiangtao msoar20incorporatingtandemduplicationsintoorthologassignmentbasedongenomerearrangement