Cargando…
CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads
Summary: Sequence comparison of genetic material between known and unknown organisms plays a crucial role in genomics, metagenomics and phylogenetic analysis. The emerging long-read sequencing technologies can now produce reads of tens of kilobases in length that promise a more accurate assessment o...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4734043/ https://www.ncbi.nlm.nih.gov/pubmed/26454281 http://dx.doi.org/10.1093/bioinformatics/btv587 |
_version_ | 1782412880842326016 |
---|---|
author | Borozan, Ivan Ferretti, Vincent |
author_facet | Borozan, Ivan Ferretti, Vincent |
author_sort | Borozan, Ivan |
collection | PubMed |
description | Summary: Sequence comparison of genetic material between known and unknown organisms plays a crucial role in genomics, metagenomics and phylogenetic analysis. The emerging long-read sequencing technologies can now produce reads of tens of kilobases in length that promise a more accurate assessment of their origin. To facilitate the classification of long and short DNA sequences, we have developed a Python package that implements a new sequence classification model that we have demonstrated to improve the classification accuracy when compared with other state of the art classification methods. For the purpose of validation, and to demonstrate its usefulness, we test the combined sequence similarity score classifier (CSSSCL) using three different datasets, including a metagenomic dataset composed of short reads. Availability and implementation: Package’s source code and test datasets are available under the GPLv3 license at https://github.com/oicr-ibc/cssscl. Contact: ivan.borozan@oicr.on.ca Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-4734043 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-47340432016-02-02 CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads Borozan, Ivan Ferretti, Vincent Bioinformatics Applications Notes Summary: Sequence comparison of genetic material between known and unknown organisms plays a crucial role in genomics, metagenomics and phylogenetic analysis. The emerging long-read sequencing technologies can now produce reads of tens of kilobases in length that promise a more accurate assessment of their origin. To facilitate the classification of long and short DNA sequences, we have developed a Python package that implements a new sequence classification model that we have demonstrated to improve the classification accuracy when compared with other state of the art classification methods. For the purpose of validation, and to demonstrate its usefulness, we test the combined sequence similarity score classifier (CSSSCL) using three different datasets, including a metagenomic dataset composed of short reads. Availability and implementation: Package’s source code and test datasets are available under the GPLv3 license at https://github.com/oicr-ibc/cssscl. Contact: ivan.borozan@oicr.on.ca Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-02-01 2015-10-09 /pmc/articles/PMC4734043/ /pubmed/26454281 http://dx.doi.org/10.1093/bioinformatics/btv587 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Applications Notes Borozan, Ivan Ferretti, Vincent CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads |
title | CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads |
title_full | CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads |
title_fullStr | CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads |
title_full_unstemmed | CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads |
title_short | CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads |
title_sort | cssscl: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4734043/ https://www.ncbi.nlm.nih.gov/pubmed/26454281 http://dx.doi.org/10.1093/bioinformatics/btv587 |
work_keys_str_mv | AT borozanivan csssclapythonpackagethatusescombinedsequencesimilarityscoresforaccuratetaxonomicclassificationoflongandshortsequencereads AT ferrettivincent csssclapythonpackagethatusescombinedsequencesimilarityscoresforaccuratetaxonomicclassificationoflongandshortsequencereads |