Cargando…

Orthonome – a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes

BACKGROUND: Distinguishing orthologous and paralogous relationships between genes across multiple species is essential for comparative genomic analyses. Various computational approaches have been developed to resolve these evolutionary relationships, but strong trade-offs between precision and recal...

Descripción completa

Detalles Bibliográficos
Autores principales: Rane, Rahul V., Oakeshott, John G., Nguyen, Thu, Hoffmann, Ary A., Lee, Siu F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5580312/
https://www.ncbi.nlm.nih.gov/pubmed/28859620
http://dx.doi.org/10.1186/s12864-017-4079-6
_version_ 1783260884709146624
author Rane, Rahul V.
Oakeshott, John G.
Nguyen, Thu
Hoffmann, Ary A.
Lee, Siu F.
author_facet Rane, Rahul V.
Oakeshott, John G.
Nguyen, Thu
Hoffmann, Ary A.
Lee, Siu F.
author_sort Rane, Rahul V.
collection PubMed
description BACKGROUND: Distinguishing orthologous and paralogous relationships between genes across multiple species is essential for comparative genomic analyses. Various computational approaches have been developed to resolve these evolutionary relationships, but strong trade-offs between precision and recall of orthologue prediction remains an ongoing challenge. RESULTS: Here we present Orthonome, an orthologue prediction pipeline, designed to reduce the trade-off between orthologue capture rates (recall) and accuracy of multi-species orthologue prediction. The pipeline compares sequence domains and then forms sequence-similar clusters before using phylogenetic comparisons to identify inparalogues. It then corrects sequence similarity metrics for fragment and gene length bias using a novel scoring metric capturing relationships between full length as well as fragmented genes. The remaining genes are then brought together for the identification of orthologues within a phylogenetic framework. The orthologue predictions are further calibrated along with inparalogues and gene births, using synteny, to identify novel orthologous relationships. We use 12 high quality Drosophila genomes to show that, compared to other orthologue prediction pipelines, Orthonome provides orthogroups with minimal error but high recall. Furthermore, Orthonome is resilient to suboptimal assembly/annotation quality, with the inclusion of draft genomes from eight additional Drosophila species still providing >6500 1:1 orthologues across all twenty species while retaining a better combination of accuracy and recall than other pipelines. Orthonome is implemented as a searchable database and query tool along with multiple-sequence alignment browsers for all sets of orthologues. The underlying documentation and database are accessible at http://www.orthonome.com. CONCLUSION: We demonstrate that Orthonome provides a superior combination of orthologue capture rates and accuracy on complete and draft drosophilid genomes when tested alongside previously published pipelines. The study also highlights a greater degree of evolutionary conservation across drosophilid species than earlier thought. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-017-4079-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5580312
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55803122017-09-07 Orthonome – a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes Rane, Rahul V. Oakeshott, John G. Nguyen, Thu Hoffmann, Ary A. Lee, Siu F. BMC Genomics Software BACKGROUND: Distinguishing orthologous and paralogous relationships between genes across multiple species is essential for comparative genomic analyses. Various computational approaches have been developed to resolve these evolutionary relationships, but strong trade-offs between precision and recall of orthologue prediction remains an ongoing challenge. RESULTS: Here we present Orthonome, an orthologue prediction pipeline, designed to reduce the trade-off between orthologue capture rates (recall) and accuracy of multi-species orthologue prediction. The pipeline compares sequence domains and then forms sequence-similar clusters before using phylogenetic comparisons to identify inparalogues. It then corrects sequence similarity metrics for fragment and gene length bias using a novel scoring metric capturing relationships between full length as well as fragmented genes. The remaining genes are then brought together for the identification of orthologues within a phylogenetic framework. The orthologue predictions are further calibrated along with inparalogues and gene births, using synteny, to identify novel orthologous relationships. We use 12 high quality Drosophila genomes to show that, compared to other orthologue prediction pipelines, Orthonome provides orthogroups with minimal error but high recall. Furthermore, Orthonome is resilient to suboptimal assembly/annotation quality, with the inclusion of draft genomes from eight additional Drosophila species still providing >6500 1:1 orthologues across all twenty species while retaining a better combination of accuracy and recall than other pipelines. Orthonome is implemented as a searchable database and query tool along with multiple-sequence alignment browsers for all sets of orthologues. The underlying documentation and database are accessible at http://www.orthonome.com. CONCLUSION: We demonstrate that Orthonome provides a superior combination of orthologue capture rates and accuracy on complete and draft drosophilid genomes when tested alongside previously published pipelines. The study also highlights a greater degree of evolutionary conservation across drosophilid species than earlier thought. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-017-4079-6) contains supplementary material, which is available to authorized users. BioMed Central 2017-08-31 /pmc/articles/PMC5580312/ /pubmed/28859620 http://dx.doi.org/10.1186/s12864-017-4079-6 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Rane, Rahul V.
Oakeshott, John G.
Nguyen, Thu
Hoffmann, Ary A.
Lee, Siu F.
Orthonome – a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes
title Orthonome – a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes
title_full Orthonome – a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes
title_fullStr Orthonome – a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes
title_full_unstemmed Orthonome – a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes
title_short Orthonome – a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes
title_sort orthonome – a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5580312/
https://www.ncbi.nlm.nih.gov/pubmed/28859620
http://dx.doi.org/10.1186/s12864-017-4079-6
work_keys_str_mv AT ranerahulv orthonomeanewpipelineforpredictinghighqualityorthologuegenesetsapplicabletocompleteanddraftgenomes
AT oakeshottjohng orthonomeanewpipelineforpredictinghighqualityorthologuegenesetsapplicabletocompleteanddraftgenomes
AT nguyenthu orthonomeanewpipelineforpredictinghighqualityorthologuegenesetsapplicabletocompleteanddraftgenomes
AT hoffmannarya orthonomeanewpipelineforpredictinghighqualityorthologuegenesetsapplicabletocompleteanddraftgenomes
AT leesiuf orthonomeanewpipelineforpredictinghighqualityorthologuegenesetsapplicabletocompleteanddraftgenomes