Cargando…

Improving the specificity of high-throughput ortholog prediction

BACKGROUND: Orthologs (genes that have diverged after a speciation event) tend to have similar function, and so their prediction has become an important component of comparative genomics and genome annotation. The gold standard phylogenetic analysis approach of comparing available organismal phyloge...

Descripción completa

Detalles Bibliográficos
Autores principales: Fulton, Debra L, Li, Yvonne Y, Laird, Matthew R, Horsman, Benjamin GS, Roche, Fiona M, Brinkman, Fiona SL
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1524997/
https://www.ncbi.nlm.nih.gov/pubmed/16729895
http://dx.doi.org/10.1186/1471-2105-7-270
_version_ 1782128875438866432
author Fulton, Debra L
Li, Yvonne Y
Laird, Matthew R
Horsman, Benjamin GS
Roche, Fiona M
Brinkman, Fiona SL
author_facet Fulton, Debra L
Li, Yvonne Y
Laird, Matthew R
Horsman, Benjamin GS
Roche, Fiona M
Brinkman, Fiona SL
author_sort Fulton, Debra L
collection PubMed
description BACKGROUND: Orthologs (genes that have diverged after a speciation event) tend to have similar function, and so their prediction has become an important component of comparative genomics and genome annotation. The gold standard phylogenetic analysis approach of comparing available organismal phylogeny to gene phylogeny is not easily automated for genome-wide analysis; therefore, ortholog prediction for large genome-scale datasets is typically performed using a reciprocal-best-BLAST-hits (RBH) approach. One problem with RBH is that it will incorrectly predict a paralog as an ortholog when incomplete genome sequences or gene loss is involved. In addition, there is an increasing interest in identifying orthologs most likely to have retained similar function. RESULTS: To address these issues, we present here a high-throughput computational method named Ortholuge that further evaluates previously predicted orthologs (including those predicted using an RBH-based approach) – identifying which orthologs most closely reflect species divergence and may more likely have similar function. Ortholuge analyzes phylogenetic distance ratios involving two comparison species and an outgroup species, noting cases where relative gene divergence is atypical. It also identifies some cases of gene duplication after species divergence. Through simulations of incomplete genome data/gene loss, we show that the vast majority of genes falsely predicted as orthologs by an RBH-based method can be identified. Ortholuge was then used to estimate the number of false-positives (predominantly paralogs) in selected RBH-predicted ortholog datasets, identifying approximately 10% paralogs in a eukaryotic data set (mouse-rat comparison) and 5% in a bacterial data set (Pseudomonas putida – Pseudomonas syringae species comparison). Higher quality (more precise) datasets of orthologs, which we term "ssd-orthologs" (supporting-species-divergence-orthologs), were also constructed. These datasets, as well as Ortholuge software that may be used to characterize other species' datasets, are available at (software under GNU General Public License). CONCLUSION: The Ortholuge method reported here appears to significantly improve the specificity (precision) of high-throughput ortholog prediction for both bacterial and eukaryotic species. This method, and its associated software, will aid those performing various comparative genomics-based analyses, such as the prediction of conserved regulatory elements upstream of orthologous genes.
format Text
id pubmed-1524997
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15249972006-08-01 Improving the specificity of high-throughput ortholog prediction Fulton, Debra L Li, Yvonne Y Laird, Matthew R Horsman, Benjamin GS Roche, Fiona M Brinkman, Fiona SL BMC Bioinformatics Methodology Article BACKGROUND: Orthologs (genes that have diverged after a speciation event) tend to have similar function, and so their prediction has become an important component of comparative genomics and genome annotation. The gold standard phylogenetic analysis approach of comparing available organismal phylogeny to gene phylogeny is not easily automated for genome-wide analysis; therefore, ortholog prediction for large genome-scale datasets is typically performed using a reciprocal-best-BLAST-hits (RBH) approach. One problem with RBH is that it will incorrectly predict a paralog as an ortholog when incomplete genome sequences or gene loss is involved. In addition, there is an increasing interest in identifying orthologs most likely to have retained similar function. RESULTS: To address these issues, we present here a high-throughput computational method named Ortholuge that further evaluates previously predicted orthologs (including those predicted using an RBH-based approach) – identifying which orthologs most closely reflect species divergence and may more likely have similar function. Ortholuge analyzes phylogenetic distance ratios involving two comparison species and an outgroup species, noting cases where relative gene divergence is atypical. It also identifies some cases of gene duplication after species divergence. Through simulations of incomplete genome data/gene loss, we show that the vast majority of genes falsely predicted as orthologs by an RBH-based method can be identified. Ortholuge was then used to estimate the number of false-positives (predominantly paralogs) in selected RBH-predicted ortholog datasets, identifying approximately 10% paralogs in a eukaryotic data set (mouse-rat comparison) and 5% in a bacterial data set (Pseudomonas putida – Pseudomonas syringae species comparison). Higher quality (more precise) datasets of orthologs, which we term "ssd-orthologs" (supporting-species-divergence-orthologs), were also constructed. These datasets, as well as Ortholuge software that may be used to characterize other species' datasets, are available at (software under GNU General Public License). CONCLUSION: The Ortholuge method reported here appears to significantly improve the specificity (precision) of high-throughput ortholog prediction for both bacterial and eukaryotic species. This method, and its associated software, will aid those performing various comparative genomics-based analyses, such as the prediction of conserved regulatory elements upstream of orthologous genes. BioMed Central 2006-05-28 /pmc/articles/PMC1524997/ /pubmed/16729895 http://dx.doi.org/10.1186/1471-2105-7-270 Text en Copyright © 2006 Fulton et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Fulton, Debra L
Li, Yvonne Y
Laird, Matthew R
Horsman, Benjamin GS
Roche, Fiona M
Brinkman, Fiona SL
Improving the specificity of high-throughput ortholog prediction
title Improving the specificity of high-throughput ortholog prediction
title_full Improving the specificity of high-throughput ortholog prediction
title_fullStr Improving the specificity of high-throughput ortholog prediction
title_full_unstemmed Improving the specificity of high-throughput ortholog prediction
title_short Improving the specificity of high-throughput ortholog prediction
title_sort improving the specificity of high-throughput ortholog prediction
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1524997/
https://www.ncbi.nlm.nih.gov/pubmed/16729895
http://dx.doi.org/10.1186/1471-2105-7-270
work_keys_str_mv AT fultondebral improvingthespecificityofhighthroughputorthologprediction
AT liyvonney improvingthespecificityofhighthroughputorthologprediction
AT lairdmatthewr improvingthespecificityofhighthroughputorthologprediction
AT horsmanbenjamings improvingthespecificityofhighthroughputorthologprediction
AT rochefionam improvingthespecificityofhighthroughputorthologprediction
AT brinkmanfionasl improvingthespecificityofhighthroughputorthologprediction