Cargando…

CD-HIT: accelerated for clustering the next-generation sequencing data

Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have devel...

Descripción completa

Detalles Bibliográficos
Autores principales: Fu, Limin, Niu, Beifang, Zhu, Zhengwei, Wu, Sitao, Li, Weizhong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3516142/
https://www.ncbi.nlm.nih.gov/pubmed/23060610
http://dx.doi.org/10.1093/bioinformatics/bts565
_version_ 1782252282096648192
author Fu, Limin
Niu, Beifang
Zhu, Zhengwei
Wu, Sitao
Li, Weizhong
author_facet Fu, Limin
Niu, Beifang
Zhu, Zhengwei
Wu, Sitao
Li, Weizhong
author_sort Fu, Limin
collection PubMed
description Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions. Availability: http://cd-hit.org. Contact: liwz@sdsc.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-3516142
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-35161422012-12-12 CD-HIT: accelerated for clustering the next-generation sequencing data Fu, Limin Niu, Beifang Zhu, Zhengwei Wu, Sitao Li, Weizhong Bioinformatics Applications Note Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions. Availability: http://cd-hit.org. Contact: liwz@sdsc.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2012-12-01 2012-10-11 /pmc/articles/PMC3516142/ /pubmed/23060610 http://dx.doi.org/10.1093/bioinformatics/bts565 Text en © The Author 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Note
Fu, Limin
Niu, Beifang
Zhu, Zhengwei
Wu, Sitao
Li, Weizhong
CD-HIT: accelerated for clustering the next-generation sequencing data
title CD-HIT: accelerated for clustering the next-generation sequencing data
title_full CD-HIT: accelerated for clustering the next-generation sequencing data
title_fullStr CD-HIT: accelerated for clustering the next-generation sequencing data
title_full_unstemmed CD-HIT: accelerated for clustering the next-generation sequencing data
title_short CD-HIT: accelerated for clustering the next-generation sequencing data
title_sort cd-hit: accelerated for clustering the next-generation sequencing data
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3516142/
https://www.ncbi.nlm.nih.gov/pubmed/23060610
http://dx.doi.org/10.1093/bioinformatics/bts565
work_keys_str_mv AT fulimin cdhitacceleratedforclusteringthenextgenerationsequencingdata
AT niubeifang cdhitacceleratedforclusteringthenextgenerationsequencingdata
AT zhuzhengwei cdhitacceleratedforclusteringthenextgenerationsequencingdata
AT wusitao cdhitacceleratedforclusteringthenextgenerationsequencingdata
AT liweizhong cdhitacceleratedforclusteringthenextgenerationsequencingdata