Cargando…
CD-HIT: accelerated for clustering the next-generation sequencing data
Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have devel...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3516142/ https://www.ncbi.nlm.nih.gov/pubmed/23060610 http://dx.doi.org/10.1093/bioinformatics/bts565 |
_version_ | 1782252282096648192 |
---|---|
author | Fu, Limin Niu, Beifang Zhu, Zhengwei Wu, Sitao Li, Weizhong |
author_facet | Fu, Limin Niu, Beifang Zhu, Zhengwei Wu, Sitao Li, Weizhong |
author_sort | Fu, Limin |
collection | PubMed |
description | Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions. Availability: http://cd-hit.org. Contact: liwz@sdsc.edu Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-3516142 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-35161422012-12-12 CD-HIT: accelerated for clustering the next-generation sequencing data Fu, Limin Niu, Beifang Zhu, Zhengwei Wu, Sitao Li, Weizhong Bioinformatics Applications Note Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions. Availability: http://cd-hit.org. Contact: liwz@sdsc.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2012-12-01 2012-10-11 /pmc/articles/PMC3516142/ /pubmed/23060610 http://dx.doi.org/10.1093/bioinformatics/bts565 Text en © The Author 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Applications Note Fu, Limin Niu, Beifang Zhu, Zhengwei Wu, Sitao Li, Weizhong CD-HIT: accelerated for clustering the next-generation sequencing data |
title | CD-HIT: accelerated for clustering the next-generation sequencing data |
title_full | CD-HIT: accelerated for clustering the next-generation sequencing data |
title_fullStr | CD-HIT: accelerated for clustering the next-generation sequencing data |
title_full_unstemmed | CD-HIT: accelerated for clustering the next-generation sequencing data |
title_short | CD-HIT: accelerated for clustering the next-generation sequencing data |
title_sort | cd-hit: accelerated for clustering the next-generation sequencing data |
topic | Applications Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3516142/ https://www.ncbi.nlm.nih.gov/pubmed/23060610 http://dx.doi.org/10.1093/bioinformatics/bts565 |
work_keys_str_mv | AT fulimin cdhitacceleratedforclusteringthenextgenerationsequencingdata AT niubeifang cdhitacceleratedforclusteringthenextgenerationsequencingdata AT zhuzhengwei cdhitacceleratedforclusteringthenextgenerationsequencingdata AT wusitao cdhitacceleratedforclusteringthenextgenerationsequencingdata AT liweizhong cdhitacceleratedforclusteringthenextgenerationsequencingdata |