Cargando…

Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes

Orthology detection is critically important for accurate functional annotation, and has been widely used to facilitate studies on comparative and evolutionary genomics. Although various methods are now available, there has been no comprehensive analysis of performance, due to the lack of a genomic-s...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Feng, Mackey, Aaron J., Vermunt, Jeroen K., Roos, David S.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1849888/
https://www.ncbi.nlm.nih.gov/pubmed/17440619
http://dx.doi.org/10.1371/journal.pone.0000383
_version_ 1782132935045939200
author Chen, Feng
Mackey, Aaron J.
Vermunt, Jeroen K.
Roos, David S.
author_facet Chen, Feng
Mackey, Aaron J.
Vermunt, Jeroen K.
Roos, David S.
author_sort Chen, Feng
collection PubMed
description Orthology detection is critically important for accurate functional annotation, and has been widely used to facilitate studies on comparative and evolutionary genomics. Although various methods are now available, there has been no comprehensive analysis of performance, due to the lack of a genomic-scale ‘gold standard’ orthology dataset. Even in the absence of such datasets, the comparison of results from alternative methodologies contains useful information, as agreement enhances confidence and disagreement indicates possible errors. Latent Class Analysis (LCA) is a statistical technique that can exploit this information to reasonably infer sensitivities and specificities, and is applied here to evaluate the performance of various orthology detection methods on a eukaryotic dataset. Overall, we observe a trade-off between sensitivity and specificity in orthology detection, with BLAST-based methods characterized by high sensitivity, and tree-based methods by high specificity. Two algorithms exhibit the best overall balance, with both sensitivity and specificity>80%: INPARANOID identifies orthologs across two species while OrthoMCL clusters orthologs from multiple species. Among methods that permit clustering of ortholog groups spanning multiple genomes, the (automated) OrthoMCL algorithm exhibits better within-group consistency with respect to protein function and domain architecture than the (manually curated) KOG database, and the homolog clustering algorithm TribeMCL as well. By way of using LCA, we are also able to comprehensively assess similarities and statistical dependence between various strategies, and evaluate the effects of parameter settings on performance. In summary, we present a comprehensive evaluation of orthology detection on a divergent set of eukaryotic genomes, thus providing insights and guides for method selection, tuning and development for different applications. Many biological questions have been addressed by multiple tests yielding binary (yes/no) outcomes but no clear definition of truth, making LCA an attractive approach for computational biology.
format Text
id pubmed-1849888
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-18498882007-04-18 Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes Chen, Feng Mackey, Aaron J. Vermunt, Jeroen K. Roos, David S. PLoS One Research Article Orthology detection is critically important for accurate functional annotation, and has been widely used to facilitate studies on comparative and evolutionary genomics. Although various methods are now available, there has been no comprehensive analysis of performance, due to the lack of a genomic-scale ‘gold standard’ orthology dataset. Even in the absence of such datasets, the comparison of results from alternative methodologies contains useful information, as agreement enhances confidence and disagreement indicates possible errors. Latent Class Analysis (LCA) is a statistical technique that can exploit this information to reasonably infer sensitivities and specificities, and is applied here to evaluate the performance of various orthology detection methods on a eukaryotic dataset. Overall, we observe a trade-off between sensitivity and specificity in orthology detection, with BLAST-based methods characterized by high sensitivity, and tree-based methods by high specificity. Two algorithms exhibit the best overall balance, with both sensitivity and specificity>80%: INPARANOID identifies orthologs across two species while OrthoMCL clusters orthologs from multiple species. Among methods that permit clustering of ortholog groups spanning multiple genomes, the (automated) OrthoMCL algorithm exhibits better within-group consistency with respect to protein function and domain architecture than the (manually curated) KOG database, and the homolog clustering algorithm TribeMCL as well. By way of using LCA, we are also able to comprehensively assess similarities and statistical dependence between various strategies, and evaluate the effects of parameter settings on performance. In summary, we present a comprehensive evaluation of orthology detection on a divergent set of eukaryotic genomes, thus providing insights and guides for method selection, tuning and development for different applications. Many biological questions have been addressed by multiple tests yielding binary (yes/no) outcomes but no clear definition of truth, making LCA an attractive approach for computational biology. Public Library of Science 2007-04-18 /pmc/articles/PMC1849888/ /pubmed/17440619 http://dx.doi.org/10.1371/journal.pone.0000383 Text en Chen et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Chen, Feng
Mackey, Aaron J.
Vermunt, Jeroen K.
Roos, David S.
Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes
title Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes
title_full Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes
title_fullStr Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes
title_full_unstemmed Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes
title_short Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes
title_sort assessing performance of orthology detection strategies applied to eukaryotic genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1849888/
https://www.ncbi.nlm.nih.gov/pubmed/17440619
http://dx.doi.org/10.1371/journal.pone.0000383
work_keys_str_mv AT chenfeng assessingperformanceoforthologydetectionstrategiesappliedtoeukaryoticgenomes
AT mackeyaaronj assessingperformanceoforthologydetectionstrategiesappliedtoeukaryoticgenomes
AT vermuntjeroenk assessingperformanceoforthologydetectionstrategiesappliedtoeukaryoticgenomes
AT roosdavids assessingperformanceoforthologydetectionstrategiesappliedtoeukaryoticgenomes