Cargando…

CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads

Summary: Sequence comparison of genetic material between known and unknown organisms plays a crucial role in genomics, metagenomics and phylogenetic analysis. The emerging long-read sequencing technologies can now produce reads of tens of kilobases in length that promise a more accurate assessment o...

Descripción completa

Detalles Bibliográficos
Autores principales: Borozan, Ivan, Ferretti, Vincent
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4734043/
https://www.ncbi.nlm.nih.gov/pubmed/26454281
http://dx.doi.org/10.1093/bioinformatics/btv587
_version_ 1782412880842326016
author Borozan, Ivan
Ferretti, Vincent
author_facet Borozan, Ivan
Ferretti, Vincent
author_sort Borozan, Ivan
collection PubMed
description Summary: Sequence comparison of genetic material between known and unknown organisms plays a crucial role in genomics, metagenomics and phylogenetic analysis. The emerging long-read sequencing technologies can now produce reads of tens of kilobases in length that promise a more accurate assessment of their origin. To facilitate the classification of long and short DNA sequences, we have developed a Python package that implements a new sequence classification model that we have demonstrated to improve the classification accuracy when compared with other state of the art classification methods. For the purpose of validation, and to demonstrate its usefulness, we test the combined sequence similarity score classifier (CSSSCL) using three different datasets, including a metagenomic dataset composed of short reads. Availability and implementation: Package’s source code and test datasets are available under the GPLv3 license at https://github.com/oicr-ibc/cssscl. Contact: ivan.borozan@oicr.on.ca Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4734043
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-47340432016-02-02 CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads Borozan, Ivan Ferretti, Vincent Bioinformatics Applications Notes Summary: Sequence comparison of genetic material between known and unknown organisms plays a crucial role in genomics, metagenomics and phylogenetic analysis. The emerging long-read sequencing technologies can now produce reads of tens of kilobases in length that promise a more accurate assessment of their origin. To facilitate the classification of long and short DNA sequences, we have developed a Python package that implements a new sequence classification model that we have demonstrated to improve the classification accuracy when compared with other state of the art classification methods. For the purpose of validation, and to demonstrate its usefulness, we test the combined sequence similarity score classifier (CSSSCL) using three different datasets, including a metagenomic dataset composed of short reads. Availability and implementation: Package’s source code and test datasets are available under the GPLv3 license at https://github.com/oicr-ibc/cssscl. Contact: ivan.borozan@oicr.on.ca Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-02-01 2015-10-09 /pmc/articles/PMC4734043/ /pubmed/26454281 http://dx.doi.org/10.1093/bioinformatics/btv587 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Notes
Borozan, Ivan
Ferretti, Vincent
CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads
title CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads
title_full CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads
title_fullStr CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads
title_full_unstemmed CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads
title_short CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads
title_sort cssscl: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4734043/
https://www.ncbi.nlm.nih.gov/pubmed/26454281
http://dx.doi.org/10.1093/bioinformatics/btv587
work_keys_str_mv AT borozanivan csssclapythonpackagethatusescombinedsequencesimilarityscoresforaccuratetaxonomicclassificationoflongandshortsequencereads
AT ferrettivincent csssclapythonpackagethatusescombinedsequencesimilarityscoresforaccuratetaxonomicclassificationoflongandshortsequencereads