Cargando…
Ultra-fast sequence clustering from similarity networks with SiLiX
BACKGROUND: The number of gene sequences that are available for comparative genomics approaches is increasing extremely quickly. A current challenge is to be able to handle this huge amount of sequences in order to build families of homologous sequences in a reasonable time. RESULTS: We present the...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3095554/ https://www.ncbi.nlm.nih.gov/pubmed/21513511 http://dx.doi.org/10.1186/1471-2105-12-116 |
Sumario: | BACKGROUND: The number of gene sequences that are available for comparative genomics approaches is increasing extremely quickly. A current challenge is to be able to handle this huge amount of sequences in order to build families of homologous sequences in a reasonable time. RESULTS: We present the software package SiLiX that implements a novel method which reconsiders single linkage clustering with a graph theoretical approach. A parallel version of the algorithms is also presented. As a demonstration of the ability of our software, we clustered more than 3 millions sequences from about 2 billion BLAST hits in 7 minutes, with a high clustering quality, both in terms of sensitivity and specificity. CONCLUSIONS: Comparing state-of-the-art software, SiLiX presents the best up-to-date capabilities to face the problem of clustering large collections of sequences. SiLiX is freely available at http://lbbe.univ-lyon1.fr/SiLiX. |
---|