Cargando…

Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data

BACKGROUND: Identification of biological specimens is a requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but generally do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances. RESULT...

Descripción completa

Detalles Bibliográficos
Autores principales: Aflitos, Saulo Alves, Severing, Edouard, Sanchez-Perez, Gabino, Peters, Sander, de Jong, Hans, de Ridder, Dick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4630969/
https://www.ncbi.nlm.nih.gov/pubmed/26525298
http://dx.doi.org/10.1186/s12859-015-0806-7
_version_ 1782398804257931264
author Aflitos, Saulo Alves
Severing, Edouard
Sanchez-Perez, Gabino
Peters, Sander
de Jong, Hans
de Ridder, Dick
author_facet Aflitos, Saulo Alves
Severing, Edouard
Sanchez-Perez, Gabino
Peters, Sander
de Jong, Hans
de Ridder, Dick
author_sort Aflitos, Saulo Alves
collection PubMed
description BACKGROUND: Identification of biological specimens is a requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but generally do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances. RESULTS: We present Cnidaria, a practical tool for clustering genomic and transcriptomic data with no limitation on genome size or phylogenetic distances. We successfully simultaneously clustered 169 genomic and transcriptomic datasets from 4 kingdoms, achieving 100 % identification accuracy at supra-species level and 78 % accuracy at the species level. CONCLUSION: CNIDARIA allows for fast, resource-efficient comparison and identification of both raw and assembled genome and transcriptome data. This can help answer both fundamental (e.g. in phylogeny, ecological diversity analysis) and practical questions (e.g. sequencing quality control, primer design). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0806-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4630969
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46309692015-11-04 Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data Aflitos, Saulo Alves Severing, Edouard Sanchez-Perez, Gabino Peters, Sander de Jong, Hans de Ridder, Dick BMC Bioinformatics Software BACKGROUND: Identification of biological specimens is a requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but generally do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances. RESULTS: We present Cnidaria, a practical tool for clustering genomic and transcriptomic data with no limitation on genome size or phylogenetic distances. We successfully simultaneously clustered 169 genomic and transcriptomic datasets from 4 kingdoms, achieving 100 % identification accuracy at supra-species level and 78 % accuracy at the species level. CONCLUSION: CNIDARIA allows for fast, resource-efficient comparison and identification of both raw and assembled genome and transcriptome data. This can help answer both fundamental (e.g. in phylogeny, ecological diversity analysis) and practical questions (e.g. sequencing quality control, primer design). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0806-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-11-02 /pmc/articles/PMC4630969/ /pubmed/26525298 http://dx.doi.org/10.1186/s12859-015-0806-7 Text en © Aflitos et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Aflitos, Saulo Alves
Severing, Edouard
Sanchez-Perez, Gabino
Peters, Sander
de Jong, Hans
de Ridder, Dick
Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data
title Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data
title_full Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data
title_fullStr Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data
title_full_unstemmed Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data
title_short Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data
title_sort cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome ngs data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4630969/
https://www.ncbi.nlm.nih.gov/pubmed/26525298
http://dx.doi.org/10.1186/s12859-015-0806-7
work_keys_str_mv AT aflitossauloalves cnidariafastreferencefreeclusteringofrawandassembledgenomeandtranscriptomengsdata
AT severingedouard cnidariafastreferencefreeclusteringofrawandassembledgenomeandtranscriptomengsdata
AT sanchezperezgabino cnidariafastreferencefreeclusteringofrawandassembledgenomeandtranscriptomengsdata
AT peterssander cnidariafastreferencefreeclusteringofrawandassembledgenomeandtranscriptomengsdata
AT dejonghans cnidariafastreferencefreeclusteringofrawandassembledgenomeandtranscriptomengsdata
AT deridderdick cnidariafastreferencefreeclusteringofrawandassembledgenomeandtranscriptomengsdata