Cargando…
PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing
MOTIVATION: One of the most widespread methods used in taxonomy studies to distinguish between strains or taxa is the calculation of average nucleotide identity. It requires a computationally expensive alignment step and is therefore not suitable for large-scale comparisons. Short oligonucleotide-ba...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7178395/ https://www.ncbi.nlm.nih.gov/pubmed/31899493 http://dx.doi.org/10.1093/bioinformatics/btz964 |
_version_ | 1783525447242350592 |
---|---|
author | Goussarov, Gleb Cleenwerck, Ilse Mysara, Mohamed Leys, Natalie Monsieurs, Pieter Tahon, Guillaume Carlier, Aurélien Vandamme, Peter Van Houdt, Rob |
author_facet | Goussarov, Gleb Cleenwerck, Ilse Mysara, Mohamed Leys, Natalie Monsieurs, Pieter Tahon, Guillaume Carlier, Aurélien Vandamme, Peter Van Houdt, Rob |
author_sort | Goussarov, Gleb |
collection | PubMed |
description | MOTIVATION: One of the most widespread methods used in taxonomy studies to distinguish between strains or taxa is the calculation of average nucleotide identity. It requires a computationally expensive alignment step and is therefore not suitable for large-scale comparisons. Short oligonucleotide-based methods do offer a faster alternative but at the expense of accuracy. Here, we aim to address this shortcoming by providing a software that implements a novel method based on short-oligonucleotide frequencies to compute inter-genomic distances. RESULTS: Our tetranucleotide and hexanucleotide implementations, which were optimized based on a taxonomically well-defined set of over 200 newly sequenced bacterial genomes, are as accurate as the short oligonucleotide-based method TETRA and average nucleotide identity, for identifying bacterial species and strains, respectively. Moreover, the lightweight nature of this method makes it applicable for large-scale analyses. AVAILABILITY AND IMPLEMENTATION: The method introduced here was implemented, together with other existing methods, in a dependency-free software written in C, GenDisCal, available as source code from https://github.com/LM-UGent/GenDisCal. The software supports multithreading and has been tested on Windows and Linux (CentOS). In addition, a Java-based graphical user interface that acts as a wrapper for the software is also available. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-7178395 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-71783952020-04-28 PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing Goussarov, Gleb Cleenwerck, Ilse Mysara, Mohamed Leys, Natalie Monsieurs, Pieter Tahon, Guillaume Carlier, Aurélien Vandamme, Peter Van Houdt, Rob Bioinformatics Original Papers MOTIVATION: One of the most widespread methods used in taxonomy studies to distinguish between strains or taxa is the calculation of average nucleotide identity. It requires a computationally expensive alignment step and is therefore not suitable for large-scale comparisons. Short oligonucleotide-based methods do offer a faster alternative but at the expense of accuracy. Here, we aim to address this shortcoming by providing a software that implements a novel method based on short-oligonucleotide frequencies to compute inter-genomic distances. RESULTS: Our tetranucleotide and hexanucleotide implementations, which were optimized based on a taxonomically well-defined set of over 200 newly sequenced bacterial genomes, are as accurate as the short oligonucleotide-based method TETRA and average nucleotide identity, for identifying bacterial species and strains, respectively. Moreover, the lightweight nature of this method makes it applicable for large-scale analyses. AVAILABILITY AND IMPLEMENTATION: The method introduced here was implemented, together with other existing methods, in a dependency-free software written in C, GenDisCal, available as source code from https://github.com/LM-UGent/GenDisCal. The software supports multithreading and has been tested on Windows and Linux (CentOS). In addition, a Java-based graphical user interface that acts as a wrapper for the software is also available. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-04-15 2020-01-03 /pmc/articles/PMC7178395/ /pubmed/31899493 http://dx.doi.org/10.1093/bioinformatics/btz964 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Papers Goussarov, Gleb Cleenwerck, Ilse Mysara, Mohamed Leys, Natalie Monsieurs, Pieter Tahon, Guillaume Carlier, Aurélien Vandamme, Peter Van Houdt, Rob PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing |
title | PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing |
title_full | PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing |
title_fullStr | PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing |
title_full_unstemmed | PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing |
title_short | PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing |
title_sort | pasit: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7178395/ https://www.ncbi.nlm.nih.gov/pubmed/31899493 http://dx.doi.org/10.1093/bioinformatics/btz964 |
work_keys_str_mv | AT goussarovgleb pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping AT cleenwerckilse pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping AT mysaramohamed pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping AT leysnatalie pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping AT monsieurspieter pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping AT tahonguillaume pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping AT carlieraurelien pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping AT vandammepeter pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping AT vanhoudtrob pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping |