Cargando…

JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm

MOTIVATION: Orthologous gene identification is fundamental to all aspects of biology. For example, ortholog identification between species can provide functional insights for genes of unknown function and is a necessary step in phylogenetic inference. Currently, most ortholog identification algorith...

Descripción completa

Detalles Bibliográficos
Autores principales: Miller, Justin B, Pickett, Brandon D, Ridge, Perry G
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6378933/
https://www.ncbi.nlm.nih.gov/pubmed/30084941
http://dx.doi.org/10.1093/bioinformatics/bty669
_version_ 1783396018027495424
author Miller, Justin B
Pickett, Brandon D
Ridge, Perry G
author_facet Miller, Justin B
Pickett, Brandon D
Ridge, Perry G
author_sort Miller, Justin B
collection PubMed
description MOTIVATION: Orthologous gene identification is fundamental to all aspects of biology. For example, ortholog identification between species can provide functional insights for genes of unknown function and is a necessary step in phylogenetic inference. Currently, most ortholog identification algorithms require all-versus-all BLAST comparisons, which are time-consuming and memory intensive. RESULTS: In contrast to existing approaches, JustOrthologs exploits the conservation of gene structure by using the lengths of coding sequence regions and dinucleotide percentages to identify orthologs. In comparison to OrthoMCL, OMA and OrthoFinder, JustOrthologs decreases ortholog identification runtime by more than 96% and achieves comparable precision and recall scores. The computational speedup allowed us to conduct pairwise comparisons of 1197 complete genomes (780 eukaryotes and 417 archaea). We confirmed gene annotations for 384 120 genes, grouped 1 675 415 genes in previously unreported ortholog groups, and identified 51 429 potentially mislabeled genes across 622 843 ortholog groups. AVAILABILITY AND IMPLEMENTATION: JustOrthologs is an open source collaborative software package available in the GitHub repository: https://github.com/ridgelab/JustOrthologs/. All test FASTA files used for comparisons are freely available at https://github.com/ridgelab/JustOrthologs/comparisonFastaFiles/. Reference genomes used in this work are available for download from the NCBI repository: ftp://ftp.ncbi.nih.gov/genomes/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6378933
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-63789332019-02-22 JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm Miller, Justin B Pickett, Brandon D Ridge, Perry G Bioinformatics Original Papers MOTIVATION: Orthologous gene identification is fundamental to all aspects of biology. For example, ortholog identification between species can provide functional insights for genes of unknown function and is a necessary step in phylogenetic inference. Currently, most ortholog identification algorithms require all-versus-all BLAST comparisons, which are time-consuming and memory intensive. RESULTS: In contrast to existing approaches, JustOrthologs exploits the conservation of gene structure by using the lengths of coding sequence regions and dinucleotide percentages to identify orthologs. In comparison to OrthoMCL, OMA and OrthoFinder, JustOrthologs decreases ortholog identification runtime by more than 96% and achieves comparable precision and recall scores. The computational speedup allowed us to conduct pairwise comparisons of 1197 complete genomes (780 eukaryotes and 417 archaea). We confirmed gene annotations for 384 120 genes, grouped 1 675 415 genes in previously unreported ortholog groups, and identified 51 429 potentially mislabeled genes across 622 843 ortholog groups. AVAILABILITY AND IMPLEMENTATION: JustOrthologs is an open source collaborative software package available in the GitHub repository: https://github.com/ridgelab/JustOrthologs/. All test FASTA files used for comparisons are freely available at https://github.com/ridgelab/JustOrthologs/comparisonFastaFiles/. Reference genomes used in this work are available for download from the NCBI repository: ftp://ftp.ncbi.nih.gov/genomes/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-02-15 2018-08-01 /pmc/articles/PMC6378933/ /pubmed/30084941 http://dx.doi.org/10.1093/bioinformatics/bty669 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Miller, Justin B
Pickett, Brandon D
Ridge, Perry G
JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm
title JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm
title_full JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm
title_fullStr JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm
title_full_unstemmed JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm
title_short JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm
title_sort justorthologs: a fast, accurate and user-friendly ortholog identification algorithm
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6378933/
https://www.ncbi.nlm.nih.gov/pubmed/30084941
http://dx.doi.org/10.1093/bioinformatics/bty669
work_keys_str_mv AT millerjustinb justorthologsafastaccurateanduserfriendlyorthologidentificationalgorithm
AT pickettbrandond justorthologsafastaccurateanduserfriendlyorthologidentificationalgorithm
AT ridgeperryg justorthologsafastaccurateanduserfriendlyorthologidentificationalgorithm