Cargando…

SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier

BACKGROUND: Gene homology type classification is required for many types of genome analyses, including comparative genomics, phylogenetics, and protein function annotation. Consequently, a large variety of tools have been developed to perform homology classification across genomes of different speci...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Xiao, Friedberg, Iddo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6812468/
https://www.ncbi.nlm.nih.gov/pubmed/31648300
http://dx.doi.org/10.1093/gigascience/giz118
_version_ 1783462667477843968
author Hu, Xiao
Friedberg, Iddo
author_facet Hu, Xiao
Friedberg, Iddo
author_sort Hu, Xiao
collection PubMed
description BACKGROUND: Gene homology type classification is required for many types of genome analyses, including comparative genomics, phylogenetics, and protein function annotation. Consequently, a large variety of tools have been developed to perform homology classification across genomes of different species. However, when applied to large genomic data sets, these tools require high memory and CPU usage, typically available only in computational clusters. FINDINGS: Here we present a new graph-based orthology analysis tool, SwiftOrtho, which is optimized for speed and memory usage when applied to large-scale data. SwiftOrtho uses long k-mers to speed up homology search, while using a reduced amino acid alphabet and spaced seeds to compensate for the loss of sensitivity due to long k-mers. In addition, it uses an affinity propagation algorithm to reduce the memory usage when clustering large-scale orthology relationships into orthologous groups. In our tests, SwiftOrtho was the only tool that completed orthology analysis of proteins from 1,760 bacterial genomes on a computer with only 4 GB RAM. Using various standard orthology data sets, we also show that SwiftOrtho has a high accuracy. CONCLUSIONS: SwiftOrtho enables the accurate comparative genomic analyses of thousands of genomes using low-memory computers. SwiftOrtho is available at https://github.com/Rinoahu/SwiftOrtho
format Online
Article
Text
id pubmed-6812468
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-68124682019-10-28 SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier Hu, Xiao Friedberg, Iddo Gigascience Technical Note BACKGROUND: Gene homology type classification is required for many types of genome analyses, including comparative genomics, phylogenetics, and protein function annotation. Consequently, a large variety of tools have been developed to perform homology classification across genomes of different species. However, when applied to large genomic data sets, these tools require high memory and CPU usage, typically available only in computational clusters. FINDINGS: Here we present a new graph-based orthology analysis tool, SwiftOrtho, which is optimized for speed and memory usage when applied to large-scale data. SwiftOrtho uses long k-mers to speed up homology search, while using a reduced amino acid alphabet and spaced seeds to compensate for the loss of sensitivity due to long k-mers. In addition, it uses an affinity propagation algorithm to reduce the memory usage when clustering large-scale orthology relationships into orthologous groups. In our tests, SwiftOrtho was the only tool that completed orthology analysis of proteins from 1,760 bacterial genomes on a computer with only 4 GB RAM. Using various standard orthology data sets, we also show that SwiftOrtho has a high accuracy. CONCLUSIONS: SwiftOrtho enables the accurate comparative genomic analyses of thousands of genomes using low-memory computers. SwiftOrtho is available at https://github.com/Rinoahu/SwiftOrtho Oxford University Press 2019-10-24 /pmc/articles/PMC6812468/ /pubmed/31648300 http://dx.doi.org/10.1093/gigascience/giz118 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Hu, Xiao
Friedberg, Iddo
SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier
title SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier
title_full SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier
title_fullStr SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier
title_full_unstemmed SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier
title_short SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier
title_sort swiftortho: a fast, memory-efficient, multiple genome orthology classifier
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6812468/
https://www.ncbi.nlm.nih.gov/pubmed/31648300
http://dx.doi.org/10.1093/gigascience/giz118
work_keys_str_mv AT huxiao swiftorthoafastmemoryefficientmultiplegenomeorthologyclassifier
AT friedbergiddo swiftorthoafastmemoryefficientmultiplegenomeorthologyclassifier