Cargando…

PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing

MOTIVATION: One of the most widespread methods used in taxonomy studies to distinguish between strains or taxa is the calculation of average nucleotide identity. It requires a computationally expensive alignment step and is therefore not suitable for large-scale comparisons. Short oligonucleotide-ba...

Descripción completa

Detalles Bibliográficos
Autores principales: Goussarov, Gleb, Cleenwerck, Ilse, Mysara, Mohamed, Leys, Natalie, Monsieurs, Pieter, Tahon, Guillaume, Carlier, Aurélien, Vandamme, Peter, Van Houdt, Rob
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7178395/
https://www.ncbi.nlm.nih.gov/pubmed/31899493
http://dx.doi.org/10.1093/bioinformatics/btz964
_version_ 1783525447242350592
author Goussarov, Gleb
Cleenwerck, Ilse
Mysara, Mohamed
Leys, Natalie
Monsieurs, Pieter
Tahon, Guillaume
Carlier, Aurélien
Vandamme, Peter
Van Houdt, Rob
author_facet Goussarov, Gleb
Cleenwerck, Ilse
Mysara, Mohamed
Leys, Natalie
Monsieurs, Pieter
Tahon, Guillaume
Carlier, Aurélien
Vandamme, Peter
Van Houdt, Rob
author_sort Goussarov, Gleb
collection PubMed
description MOTIVATION: One of the most widespread methods used in taxonomy studies to distinguish between strains or taxa is the calculation of average nucleotide identity. It requires a computationally expensive alignment step and is therefore not suitable for large-scale comparisons. Short oligonucleotide-based methods do offer a faster alternative but at the expense of accuracy. Here, we aim to address this shortcoming by providing a software that implements a novel method based on short-oligonucleotide frequencies to compute inter-genomic distances. RESULTS: Our tetranucleotide and hexanucleotide implementations, which were optimized based on a taxonomically well-defined set of over 200 newly sequenced bacterial genomes, are as accurate as the short oligonucleotide-based method TETRA and average nucleotide identity, for identifying bacterial species and strains, respectively. Moreover, the lightweight nature of this method makes it applicable for large-scale analyses. AVAILABILITY AND IMPLEMENTATION: The method introduced here was implemented, together with other existing methods, in a dependency-free software written in C, GenDisCal, available as source code from https://github.com/LM-UGent/GenDisCal. The software supports multithreading and has been tested on Windows and Linux (CentOS). In addition, a Java-based graphical user interface that acts as a wrapper for the software is also available. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7178395
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-71783952020-04-28 PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing Goussarov, Gleb Cleenwerck, Ilse Mysara, Mohamed Leys, Natalie Monsieurs, Pieter Tahon, Guillaume Carlier, Aurélien Vandamme, Peter Van Houdt, Rob Bioinformatics Original Papers MOTIVATION: One of the most widespread methods used in taxonomy studies to distinguish between strains or taxa is the calculation of average nucleotide identity. It requires a computationally expensive alignment step and is therefore not suitable for large-scale comparisons. Short oligonucleotide-based methods do offer a faster alternative but at the expense of accuracy. Here, we aim to address this shortcoming by providing a software that implements a novel method based on short-oligonucleotide frequencies to compute inter-genomic distances. RESULTS: Our tetranucleotide and hexanucleotide implementations, which were optimized based on a taxonomically well-defined set of over 200 newly sequenced bacterial genomes, are as accurate as the short oligonucleotide-based method TETRA and average nucleotide identity, for identifying bacterial species and strains, respectively. Moreover, the lightweight nature of this method makes it applicable for large-scale analyses. AVAILABILITY AND IMPLEMENTATION: The method introduced here was implemented, together with other existing methods, in a dependency-free software written in C, GenDisCal, available as source code from https://github.com/LM-UGent/GenDisCal. The software supports multithreading and has been tested on Windows and Linux (CentOS). In addition, a Java-based graphical user interface that acts as a wrapper for the software is also available. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-04-15 2020-01-03 /pmc/articles/PMC7178395/ /pubmed/31899493 http://dx.doi.org/10.1093/bioinformatics/btz964 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Goussarov, Gleb
Cleenwerck, Ilse
Mysara, Mohamed
Leys, Natalie
Monsieurs, Pieter
Tahon, Guillaume
Carlier, Aurélien
Vandamme, Peter
Van Houdt, Rob
PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing
title PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing
title_full PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing
title_fullStr PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing
title_full_unstemmed PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing
title_short PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing
title_sort pasit: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7178395/
https://www.ncbi.nlm.nih.gov/pubmed/31899493
http://dx.doi.org/10.1093/bioinformatics/btz964
work_keys_str_mv AT goussarovgleb pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping
AT cleenwerckilse pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping
AT mysaramohamed pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping
AT leysnatalie pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping
AT monsieurspieter pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping
AT tahonguillaume pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping
AT carlieraurelien pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping
AT vandammepeter pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping
AT vanhoudtrob pasitanovelapproachbasedonshortoligonucleotidefrequenciesforefficientbacterialidentificationandtyping