Cargando…

GHOSTX: An Improved Sequence Homology Search Algorithm Using a Query Suffix Array and a Database Suffix Array

DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, huge amounts of sequence data create the problem that even general homology search analyses using BLASTX become difficult in...

Descripción completa

Detalles Bibliográficos
Autores principales:	Suzuki, Shuji, Kakuta, Masanori, Ishida, Takashi, Akiyama, Yutaka
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4123905/ https://www.ncbi.nlm.nih.gov/pubmed/25099887 http://dx.doi.org/10.1371/journal.pone.0103833

_version_	1782329548285673472
author	Suzuki, Shuji Kakuta, Masanori Ishida, Takashi Akiyama, Yutaka
author_facet	Suzuki, Shuji Kakuta, Masanori Ishida, Takashi Akiyama, Yutaka
author_sort	Suzuki, Shuji
collection	PubMed
description	DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, huge amounts of sequence data create the problem that even general homology search analyses using BLASTX become difficult in terms of computational cost. We designed a new homology search algorithm that finds seed sequences based on the suffix arrays of a query and a database, and have implemented it as GHOSTX. GHOSTX achieved approximately 131–165 times acceleration over a BLASTX search at similar levels of sensitivity. GHOSTX is distributed under the BSD 2-clause license and is available for download at http://www.bi.cs.titech.ac.jp/ghostx/. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We offer this tool as a potential solution to this problem.
format	Online Article Text
id	pubmed-4123905
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-41239052014-08-12 GHOSTX: An Improved Sequence Homology Search Algorithm Using a Query Suffix Array and a Database Suffix Array Suzuki, Shuji Kakuta, Masanori Ishida, Takashi Akiyama, Yutaka PLoS One Research Article DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, huge amounts of sequence data create the problem that even general homology search analyses using BLASTX become difficult in terms of computational cost. We designed a new homology search algorithm that finds seed sequences based on the suffix arrays of a query and a database, and have implemented it as GHOSTX. GHOSTX achieved approximately 131–165 times acceleration over a BLASTX search at similar levels of sensitivity. GHOSTX is distributed under the BSD 2-clause license and is available for download at http://www.bi.cs.titech.ac.jp/ghostx/. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We offer this tool as a potential solution to this problem. Public Library of Science 2014-08-06 /pmc/articles/PMC4123905/ /pubmed/25099887 http://dx.doi.org/10.1371/journal.pone.0103833 Text en © 2014 Suzuki et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Suzuki, Shuji Kakuta, Masanori Ishida, Takashi Akiyama, Yutaka GHOSTX: An Improved Sequence Homology Search Algorithm Using a Query Suffix Array and a Database Suffix Array
title	GHOSTX: An Improved Sequence Homology Search Algorithm Using a Query Suffix Array and a Database Suffix Array
title_full	GHOSTX: An Improved Sequence Homology Search Algorithm Using a Query Suffix Array and a Database Suffix Array
title_fullStr	GHOSTX: An Improved Sequence Homology Search Algorithm Using a Query Suffix Array and a Database Suffix Array
title_full_unstemmed	GHOSTX: An Improved Sequence Homology Search Algorithm Using a Query Suffix Array and a Database Suffix Array
title_short	GHOSTX: An Improved Sequence Homology Search Algorithm Using a Query Suffix Array and a Database Suffix Array
title_sort	ghostx: an improved sequence homology search algorithm using a query suffix array and a database suffix array
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4123905/ https://www.ncbi.nlm.nih.gov/pubmed/25099887 http://dx.doi.org/10.1371/journal.pone.0103833
work_keys_str_mv	AT suzukishuji ghostxanimprovedsequencehomologysearchalgorithmusingaquerysuffixarrayandadatabasesuffixarray AT kakutamasanori ghostxanimprovedsequencehomologysearchalgorithmusingaquerysuffixarrayandadatabasesuffixarray AT ishidatakashi ghostxanimprovedsequencehomologysearchalgorithmusingaquerysuffixarrayandadatabasesuffixarray AT akiyamayutaka ghostxanimprovedsequencehomologysearchalgorithmusingaquerysuffixarrayandadatabasesuffixarray

GHOSTX: An Improved Sequence Homology Search Algorithm Using a Query Suffix Array and a Database Suffix Array

Ejemplares similares