Cargando…

QuartetS: a fast and accurate algorithm for large-scale orthology detection

The unparalleled growth in the availability of genomic data offers both a challenge to develop orthology detection methods that are simultaneously accurate and high throughput and an opportunity to improve orthology detection by leveraging evolutionary evidence in the accumulated sequenced genomes....

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Chenggang, Zavaljevski, Nela, Desai, Valmik, Reifman, Jaques
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3141274/
https://www.ncbi.nlm.nih.gov/pubmed/21572104
http://dx.doi.org/10.1093/nar/gkr308
_version_ 1782208652050956288
author Yu, Chenggang
Zavaljevski, Nela
Desai, Valmik
Reifman, Jaques
author_facet Yu, Chenggang
Zavaljevski, Nela
Desai, Valmik
Reifman, Jaques
author_sort Yu, Chenggang
collection PubMed
description The unparalleled growth in the availability of genomic data offers both a challenge to develop orthology detection methods that are simultaneously accurate and high throughput and an opportunity to improve orthology detection by leveraging evolutionary evidence in the accumulated sequenced genomes. Here, we report a novel orthology detection method, termed QuartetS, that exploits evolutionary evidence in a computationally efficient manner. Based on the well-established evolutionary concept that gene duplication events can be used to discriminate homologous genes, QuartetS uses an approximate phylogenetic analysis of quartet gene trees to infer the occurrence of duplication events and discriminate paralogous from orthologous genes. We used function- and phylogeny-based metrics to perform a large-scale, systematic comparison of the orthology predictions of QuartetS with those of four other methods [bi-directional best hit (BBH), outgroup, OMA and QuartetS-C (QuartetS followed by clustering)], involving 624 bacterial genomes and >2 million genes. We found that QuartetS slightly, but consistently, outperformed the highly specific OMA method and that, while consuming only 0.5% additional computational time, QuartetS predicted 50% more orthologs with a 50% lower false positive rate than the widely used BBH method. We conclude that, for large-scale phylogenetic and functional analysis, QuartetS and QuartetS-C should be preferred, respectively, in applications where high accuracy and high throughput are required.
format Online
Article
Text
id pubmed-3141274
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-31412742011-07-22 QuartetS: a fast and accurate algorithm for large-scale orthology detection Yu, Chenggang Zavaljevski, Nela Desai, Valmik Reifman, Jaques Nucleic Acids Res Methods Online The unparalleled growth in the availability of genomic data offers both a challenge to develop orthology detection methods that are simultaneously accurate and high throughput and an opportunity to improve orthology detection by leveraging evolutionary evidence in the accumulated sequenced genomes. Here, we report a novel orthology detection method, termed QuartetS, that exploits evolutionary evidence in a computationally efficient manner. Based on the well-established evolutionary concept that gene duplication events can be used to discriminate homologous genes, QuartetS uses an approximate phylogenetic analysis of quartet gene trees to infer the occurrence of duplication events and discriminate paralogous from orthologous genes. We used function- and phylogeny-based metrics to perform a large-scale, systematic comparison of the orthology predictions of QuartetS with those of four other methods [bi-directional best hit (BBH), outgroup, OMA and QuartetS-C (QuartetS followed by clustering)], involving 624 bacterial genomes and >2 million genes. We found that QuartetS slightly, but consistently, outperformed the highly specific OMA method and that, while consuming only 0.5% additional computational time, QuartetS predicted 50% more orthologs with a 50% lower false positive rate than the widely used BBH method. We conclude that, for large-scale phylogenetic and functional analysis, QuartetS and QuartetS-C should be preferred, respectively, in applications where high accuracy and high throughput are required. Oxford University Press 2011-07 2011-05-13 /pmc/articles/PMC3141274/ /pubmed/21572104 http://dx.doi.org/10.1093/nar/gkr308 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Yu, Chenggang
Zavaljevski, Nela
Desai, Valmik
Reifman, Jaques
QuartetS: a fast and accurate algorithm for large-scale orthology detection
title QuartetS: a fast and accurate algorithm for large-scale orthology detection
title_full QuartetS: a fast and accurate algorithm for large-scale orthology detection
title_fullStr QuartetS: a fast and accurate algorithm for large-scale orthology detection
title_full_unstemmed QuartetS: a fast and accurate algorithm for large-scale orthology detection
title_short QuartetS: a fast and accurate algorithm for large-scale orthology detection
title_sort quartets: a fast and accurate algorithm for large-scale orthology detection
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3141274/
https://www.ncbi.nlm.nih.gov/pubmed/21572104
http://dx.doi.org/10.1093/nar/gkr308
work_keys_str_mv AT yuchenggang quartetsafastandaccuratealgorithmforlargescaleorthologydetection
AT zavaljevskinela quartetsafastandaccuratealgorithmforlargescaleorthologydetection
AT desaivalmik quartetsafastandaccuratealgorithmforlargescaleorthologydetection
AT reifmanjaques quartetsafastandaccuratealgorithmforlargescaleorthologydetection