Cargando…
Faster sequence homology searches by clustering subsequences
Motivation: Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic ana...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393512/ https://www.ncbi.nlm.nih.gov/pubmed/25432166 http://dx.doi.org/10.1093/bioinformatics/btu780 |
_version_ | 1782366169644138496 |
---|---|
author | Suzuki, Shuji Kakuta, Masanori Ishida, Takashi Akiyama, Yutaka |
author_facet | Suzuki, Shuji Kakuta, Masanori Ishida, Takashi Akiyama, Yutaka |
author_sort | Suzuki, Shuji |
collection | PubMed |
description | Motivation: Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. Results: We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2–2.8 times faster than RAPSearch and is ∼185–261 times faster than BLASTX. Availability and implementation: The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/ Contact: akiyama@cs.titech.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-4393512 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-43935122015-06-26 Faster sequence homology searches by clustering subsequences Suzuki, Shuji Kakuta, Masanori Ishida, Takashi Akiyama, Yutaka Bioinformatics Original Papers Motivation: Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. Results: We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2–2.8 times faster than RAPSearch and is ∼185–261 times faster than BLASTX. Availability and implementation: The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/ Contact: akiyama@cs.titech.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2015-04-15 2014-11-27 /pmc/articles/PMC4393512/ /pubmed/25432166 http://dx.doi.org/10.1093/bioinformatics/btu780 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Suzuki, Shuji Kakuta, Masanori Ishida, Takashi Akiyama, Yutaka Faster sequence homology searches by clustering subsequences |
title | Faster sequence homology searches by clustering subsequences |
title_full | Faster sequence homology searches by clustering subsequences |
title_fullStr | Faster sequence homology searches by clustering subsequences |
title_full_unstemmed | Faster sequence homology searches by clustering subsequences |
title_short | Faster sequence homology searches by clustering subsequences |
title_sort | faster sequence homology searches by clustering subsequences |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393512/ https://www.ncbi.nlm.nih.gov/pubmed/25432166 http://dx.doi.org/10.1093/bioinformatics/btu780 |
work_keys_str_mv | AT suzukishuji fastersequencehomologysearchesbyclusteringsubsequences AT kakutamasanori fastersequencehomologysearchesbyclusteringsubsequences AT ishidatakashi fastersequencehomologysearchesbyclusteringsubsequences AT akiyamayutaka fastersequencehomologysearchesbyclusteringsubsequences |