Cargando…

Faster sequence homology searches by clustering subsequences

Motivation: Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic ana...

Descripción completa

Detalles Bibliográficos
Autores principales: Suzuki, Shuji, Kakuta, Masanori, Ishida, Takashi, Akiyama, Yutaka
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393512/
https://www.ncbi.nlm.nih.gov/pubmed/25432166
http://dx.doi.org/10.1093/bioinformatics/btu780
_version_ 1782366169644138496
author Suzuki, Shuji
Kakuta, Masanori
Ishida, Takashi
Akiyama, Yutaka
author_facet Suzuki, Shuji
Kakuta, Masanori
Ishida, Takashi
Akiyama, Yutaka
author_sort Suzuki, Shuji
collection PubMed
description Motivation: Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. Results: We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2–2.8 times faster than RAPSearch and is ∼185–261 times faster than BLASTX. Availability and implementation: The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/ Contact: akiyama@cs.titech.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4393512
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-43935122015-06-26 Faster sequence homology searches by clustering subsequences Suzuki, Shuji Kakuta, Masanori Ishida, Takashi Akiyama, Yutaka Bioinformatics Original Papers Motivation: Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. Results: We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2–2.8 times faster than RAPSearch and is ∼185–261 times faster than BLASTX. Availability and implementation: The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/ Contact: akiyama@cs.titech.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2015-04-15 2014-11-27 /pmc/articles/PMC4393512/ /pubmed/25432166 http://dx.doi.org/10.1093/bioinformatics/btu780 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Suzuki, Shuji
Kakuta, Masanori
Ishida, Takashi
Akiyama, Yutaka
Faster sequence homology searches by clustering subsequences
title Faster sequence homology searches by clustering subsequences
title_full Faster sequence homology searches by clustering subsequences
title_fullStr Faster sequence homology searches by clustering subsequences
title_full_unstemmed Faster sequence homology searches by clustering subsequences
title_short Faster sequence homology searches by clustering subsequences
title_sort faster sequence homology searches by clustering subsequences
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393512/
https://www.ncbi.nlm.nih.gov/pubmed/25432166
http://dx.doi.org/10.1093/bioinformatics/btu780
work_keys_str_mv AT suzukishuji fastersequencehomologysearchesbyclusteringsubsequences
AT kakutamasanori fastersequencehomologysearchesbyclusteringsubsequences
AT ishidatakashi fastersequencehomologysearchesbyclusteringsubsequences
AT akiyamayutaka fastersequencehomologysearchesbyclusteringsubsequences