Cargando…

Faster sequence homology searches by clustering subsequences

Motivation: Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic ana...

Descripción completa

Detalles Bibliográficos
Autores principales:	Suzuki, Shuji, Kakuta, Masanori, Ishida, Takashi, Akiyama, Yutaka
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2015
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393512/ https://www.ncbi.nlm.nih.gov/pubmed/25432166 http://dx.doi.org/10.1093/bioinformatics/btu780

Descripción
Sumario:	Motivation: Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. Results: We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2–2.8 times faster than RAPSearch and is ∼185–261 times faster than BLASTX. Availability and implementation: The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/ Contact: akiyama@cs.titech.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.

Faster sequence homology searches by clustering subsequences

Ejemplares similares