Cargando…

CD-HIT: accelerated for clustering the next-generation sequencing data

Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have devel...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fu, Limin, Niu, Beifang, Zhu, Zhengwei, Wu, Sitao, Li, Weizhong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2012
Materias:	Applications Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3516142/ https://www.ncbi.nlm.nih.gov/pubmed/23060610 http://dx.doi.org/10.1093/bioinformatics/bts565

_version_	1782252282096648192
author	Fu, Limin Niu, Beifang Zhu, Zhengwei Wu, Sitao Li, Weizhong
author_facet	Fu, Limin Niu, Beifang Zhu, Zhengwei Wu, Sitao Li, Weizhong
author_sort	Fu, Limin
collection	PubMed
description	Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions. Availability: http://cd-hit.org. Contact: liwz@sdsc.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-3516142
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-35161422012-12-12 CD-HIT: accelerated for clustering the next-generation sequencing data Fu, Limin Niu, Beifang Zhu, Zhengwei Wu, Sitao Li, Weizhong Bioinformatics Applications Note Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions. Availability: http://cd-hit.org. Contact: liwz@sdsc.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2012-12-01 2012-10-11 /pmc/articles/PMC3516142/ /pubmed/23060610 http://dx.doi.org/10.1093/bioinformatics/bts565 Text en © The Author 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Applications Note Fu, Limin Niu, Beifang Zhu, Zhengwei Wu, Sitao Li, Weizhong CD-HIT: accelerated for clustering the next-generation sequencing data
title	CD-HIT: accelerated for clustering the next-generation sequencing data
title_full	CD-HIT: accelerated for clustering the next-generation sequencing data
title_fullStr	CD-HIT: accelerated for clustering the next-generation sequencing data
title_full_unstemmed	CD-HIT: accelerated for clustering the next-generation sequencing data
title_short	CD-HIT: accelerated for clustering the next-generation sequencing data
title_sort	cd-hit: accelerated for clustering the next-generation sequencing data
topic	Applications Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3516142/ https://www.ncbi.nlm.nih.gov/pubmed/23060610 http://dx.doi.org/10.1093/bioinformatics/bts565
work_keys_str_mv	AT fulimin cdhitacceleratedforclusteringthenextgenerationsequencingdata AT niubeifang cdhitacceleratedforclusteringthenextgenerationsequencingdata AT zhuzhengwei cdhitacceleratedforclusteringthenextgenerationsequencingdata AT wusitao cdhitacceleratedforclusteringthenextgenerationsequencingdata AT liweizhong cdhitacceleratedforclusteringthenextgenerationsequencingdata

CD-HIT: accelerated for clustering the next-generation sequencing data

Ejemplares similares