Cargando…

ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies

TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free met...

Descripción completa

Detalles Bibliográficos
Autores principales: Léonard, Raphaël R., Leleu, Marie, Van Vlierberghe, Mick, Cornet, Luc, Kerff, Frédéric, Baurain, Denis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8106394/
https://www.ncbi.nlm.nih.gov/pubmed/33996287
http://dx.doi.org/10.7717/peerj.11348
_version_ 1783689768310145024
author Léonard, Raphaël R.
Leleu, Marie
Van Vlierberghe, Mick
Cornet, Luc
Kerff, Frédéric
Baurain, Denis
author_facet Léonard, Raphaël R.
Leleu, Marie
Van Vlierberghe, Mick
Cornet, Luc
Kerff, Frédéric
Baurain, Denis
author_sort Léonard, Raphaël R.
collection PubMed
description TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free methods (k-mers), an iterative single-linkage approach and a divide-and-conquer strategy to remain both efficient and scalable. We studied the performance of TQMD by verifying the influence of its parameters and heuristics on the clustering outcome. We further compared TQMD to two other dereplication tools (dRep and Assembly-Dereplicator). Our results showed that TQMD is primarily optimized to dereplicate at higher taxonomic levels (phylum/class), as opposed to the other dereplication tools, but also works at lower taxonomic levels (species/strain) like the other dereplication tools. TQMD is available from source and as a Singularity container at [https://bitbucket.org/phylogeno/tqmd ].
format Online
Article
Text
id pubmed-8106394
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-81063942021-05-13 ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies Léonard, Raphaël R. Leleu, Marie Van Vlierberghe, Mick Cornet, Luc Kerff, Frédéric Baurain, Denis PeerJ Bioinformatics TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free methods (k-mers), an iterative single-linkage approach and a divide-and-conquer strategy to remain both efficient and scalable. We studied the performance of TQMD by verifying the influence of its parameters and heuristics on the clustering outcome. We further compared TQMD to two other dereplication tools (dRep and Assembly-Dereplicator). Our results showed that TQMD is primarily optimized to dereplicate at higher taxonomic levels (phylum/class), as opposed to the other dereplication tools, but also works at lower taxonomic levels (species/strain) like the other dereplication tools. TQMD is available from source and as a Singularity container at [https://bitbucket.org/phylogeno/tqmd ]. PeerJ Inc. 2021-05-05 /pmc/articles/PMC8106394/ /pubmed/33996287 http://dx.doi.org/10.7717/peerj.11348 Text en ©2021 Léonard et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Léonard, Raphaël R.
Leleu, Marie
Van Vlierberghe, Mick
Cornet, Luc
Kerff, Frédéric
Baurain, Denis
ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies
title ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies
title_full ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies
title_fullStr ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies
title_full_unstemmed ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies
title_short ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies
title_sort torquemada: tool for retrieving queried eubacteria, metadata and dereplicating assemblies
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8106394/
https://www.ncbi.nlm.nih.gov/pubmed/33996287
http://dx.doi.org/10.7717/peerj.11348
work_keys_str_mv AT leonardraphaelr torquemadatoolforretrievingqueriedeubacteriametadataanddereplicatingassemblies
AT leleumarie torquemadatoolforretrievingqueriedeubacteriametadataanddereplicatingassemblies
AT vanvlierberghemick torquemadatoolforretrievingqueriedeubacteriametadataanddereplicatingassemblies
AT cornetluc torquemadatoolforretrievingqueriedeubacteriametadataanddereplicatingassemblies
AT kerfffrederic torquemadatoolforretrievingqueriedeubacteriametadataanddereplicatingassemblies
AT bauraindenis torquemadatoolforretrievingqueriedeubacteriametadataanddereplicatingassemblies