Cargando…
JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm
MOTIVATION: Orthologous gene identification is fundamental to all aspects of biology. For example, ortholog identification between species can provide functional insights for genes of unknown function and is a necessary step in phylogenetic inference. Currently, most ortholog identification algorith...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6378933/ https://www.ncbi.nlm.nih.gov/pubmed/30084941 http://dx.doi.org/10.1093/bioinformatics/bty669 |
_version_ | 1783396018027495424 |
---|---|
author | Miller, Justin B Pickett, Brandon D Ridge, Perry G |
author_facet | Miller, Justin B Pickett, Brandon D Ridge, Perry G |
author_sort | Miller, Justin B |
collection | PubMed |
description | MOTIVATION: Orthologous gene identification is fundamental to all aspects of biology. For example, ortholog identification between species can provide functional insights for genes of unknown function and is a necessary step in phylogenetic inference. Currently, most ortholog identification algorithms require all-versus-all BLAST comparisons, which are time-consuming and memory intensive. RESULTS: In contrast to existing approaches, JustOrthologs exploits the conservation of gene structure by using the lengths of coding sequence regions and dinucleotide percentages to identify orthologs. In comparison to OrthoMCL, OMA and OrthoFinder, JustOrthologs decreases ortholog identification runtime by more than 96% and achieves comparable precision and recall scores. The computational speedup allowed us to conduct pairwise comparisons of 1197 complete genomes (780 eukaryotes and 417 archaea). We confirmed gene annotations for 384 120 genes, grouped 1 675 415 genes in previously unreported ortholog groups, and identified 51 429 potentially mislabeled genes across 622 843 ortholog groups. AVAILABILITY AND IMPLEMENTATION: JustOrthologs is an open source collaborative software package available in the GitHub repository: https://github.com/ridgelab/JustOrthologs/. All test FASTA files used for comparisons are freely available at https://github.com/ridgelab/JustOrthologs/comparisonFastaFiles/. Reference genomes used in this work are available for download from the NCBI repository: ftp://ftp.ncbi.nih.gov/genomes/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6378933 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-63789332019-02-22 JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm Miller, Justin B Pickett, Brandon D Ridge, Perry G Bioinformatics Original Papers MOTIVATION: Orthologous gene identification is fundamental to all aspects of biology. For example, ortholog identification between species can provide functional insights for genes of unknown function and is a necessary step in phylogenetic inference. Currently, most ortholog identification algorithms require all-versus-all BLAST comparisons, which are time-consuming and memory intensive. RESULTS: In contrast to existing approaches, JustOrthologs exploits the conservation of gene structure by using the lengths of coding sequence regions and dinucleotide percentages to identify orthologs. In comparison to OrthoMCL, OMA and OrthoFinder, JustOrthologs decreases ortholog identification runtime by more than 96% and achieves comparable precision and recall scores. The computational speedup allowed us to conduct pairwise comparisons of 1197 complete genomes (780 eukaryotes and 417 archaea). We confirmed gene annotations for 384 120 genes, grouped 1 675 415 genes in previously unreported ortholog groups, and identified 51 429 potentially mislabeled genes across 622 843 ortholog groups. AVAILABILITY AND IMPLEMENTATION: JustOrthologs is an open source collaborative software package available in the GitHub repository: https://github.com/ridgelab/JustOrthologs/. All test FASTA files used for comparisons are freely available at https://github.com/ridgelab/JustOrthologs/comparisonFastaFiles/. Reference genomes used in this work are available for download from the NCBI repository: ftp://ftp.ncbi.nih.gov/genomes/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-02-15 2018-08-01 /pmc/articles/PMC6378933/ /pubmed/30084941 http://dx.doi.org/10.1093/bioinformatics/bty669 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Miller, Justin B Pickett, Brandon D Ridge, Perry G JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm |
title | JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm |
title_full | JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm |
title_fullStr | JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm |
title_full_unstemmed | JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm |
title_short | JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm |
title_sort | justorthologs: a fast, accurate and user-friendly ortholog identification algorithm |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6378933/ https://www.ncbi.nlm.nih.gov/pubmed/30084941 http://dx.doi.org/10.1093/bioinformatics/bty669 |
work_keys_str_mv | AT millerjustinb justorthologsafastaccurateanduserfriendlyorthologidentificationalgorithm AT pickettbrandond justorthologsafastaccurateanduserfriendlyorthologidentificationalgorithm AT ridgeperryg justorthologsafastaccurateanduserfriendlyorthologidentificationalgorithm |