Cargando…
Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data
BACKGROUND: Identification of biological specimens is a requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but generally do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances. RESULT...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4630969/ https://www.ncbi.nlm.nih.gov/pubmed/26525298 http://dx.doi.org/10.1186/s12859-015-0806-7 |
_version_ | 1782398804257931264 |
---|---|
author | Aflitos, Saulo Alves Severing, Edouard Sanchez-Perez, Gabino Peters, Sander de Jong, Hans de Ridder, Dick |
author_facet | Aflitos, Saulo Alves Severing, Edouard Sanchez-Perez, Gabino Peters, Sander de Jong, Hans de Ridder, Dick |
author_sort | Aflitos, Saulo Alves |
collection | PubMed |
description | BACKGROUND: Identification of biological specimens is a requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but generally do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances. RESULTS: We present Cnidaria, a practical tool for clustering genomic and transcriptomic data with no limitation on genome size or phylogenetic distances. We successfully simultaneously clustered 169 genomic and transcriptomic datasets from 4 kingdoms, achieving 100 % identification accuracy at supra-species level and 78 % accuracy at the species level. CONCLUSION: CNIDARIA allows for fast, resource-efficient comparison and identification of both raw and assembled genome and transcriptome data. This can help answer both fundamental (e.g. in phylogeny, ecological diversity analysis) and practical questions (e.g. sequencing quality control, primer design). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0806-7) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4630969 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-46309692015-11-04 Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data Aflitos, Saulo Alves Severing, Edouard Sanchez-Perez, Gabino Peters, Sander de Jong, Hans de Ridder, Dick BMC Bioinformatics Software BACKGROUND: Identification of biological specimens is a requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but generally do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances. RESULTS: We present Cnidaria, a practical tool for clustering genomic and transcriptomic data with no limitation on genome size or phylogenetic distances. We successfully simultaneously clustered 169 genomic and transcriptomic datasets from 4 kingdoms, achieving 100 % identification accuracy at supra-species level and 78 % accuracy at the species level. CONCLUSION: CNIDARIA allows for fast, resource-efficient comparison and identification of both raw and assembled genome and transcriptome data. This can help answer both fundamental (e.g. in phylogeny, ecological diversity analysis) and practical questions (e.g. sequencing quality control, primer design). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0806-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-11-02 /pmc/articles/PMC4630969/ /pubmed/26525298 http://dx.doi.org/10.1186/s12859-015-0806-7 Text en © Aflitos et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Aflitos, Saulo Alves Severing, Edouard Sanchez-Perez, Gabino Peters, Sander de Jong, Hans de Ridder, Dick Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data |
title | Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data |
title_full | Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data |
title_fullStr | Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data |
title_full_unstemmed | Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data |
title_short | Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data |
title_sort | cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome ngs data |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4630969/ https://www.ncbi.nlm.nih.gov/pubmed/26525298 http://dx.doi.org/10.1186/s12859-015-0806-7 |
work_keys_str_mv | AT aflitossauloalves cnidariafastreferencefreeclusteringofrawandassembledgenomeandtranscriptomengsdata AT severingedouard cnidariafastreferencefreeclusteringofrawandassembledgenomeandtranscriptomengsdata AT sanchezperezgabino cnidariafastreferencefreeclusteringofrawandassembledgenomeandtranscriptomengsdata AT peterssander cnidariafastreferencefreeclusteringofrawandassembledgenomeandtranscriptomengsdata AT dejonghans cnidariafastreferencefreeclusteringofrawandassembledgenomeandtranscriptomengsdata AT deridderdick cnidariafastreferencefreeclusteringofrawandassembledgenomeandtranscriptomengsdata |