Cargando…

A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data

Sequence similarity searches have been widely used in the analyses of metagenomic sequencing data. Finding homologous sequences in a reference database enables the estimation of taxonomic and functional characteristics of each query sequence. Because current metagenomic sequencing data consist of a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kakuta, Masanori, Suzuki, Shuji, Izawa, Kazuki, Ishida, Takashi, Akiyama, Yutaka
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2017
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5666806/ https://www.ncbi.nlm.nih.gov/pubmed/29019934 http://dx.doi.org/10.3390/ijms18102124

_version_	1783275377132568576
author	Kakuta, Masanori Suzuki, Shuji Izawa, Kazuki Ishida, Takashi Akiyama, Yutaka
author_facet	Kakuta, Masanori Suzuki, Shuji Izawa, Kazuki Ishida, Takashi Akiyama, Yutaka
author_sort	Kakuta, Masanori
collection	PubMed
description	Sequence similarity searches have been widely used in the analyses of metagenomic sequencing data. Finding homologous sequences in a reference database enables the estimation of taxonomic and functional characteristics of each query sequence. Because current metagenomic sequencing data consist of a large number of nucleotide sequences, the time required for sequence similarity searches account for a large proportion of the total time. This time-consuming step makes it difficult to perform large-scale analyses. To analyze large-scale metagenomic data, such as those found in the human oral microbiome, we developed GHOST-MP (Genome-wide HOmology Search Tool on Massively Parallel system), a parallel sequence similarity search tool for massively parallel computing systems. This tool uses a fast search algorithm based on suffix arrays of query and database sequences and a hierarchical parallel search to accelerate the large-scale sequence similarity search of metagenomic sequencing data. The parallel computing efficiency and the search speed of this tool were evaluated. GHOST-MP was shown to be scalable over 10,000 CPU (Central Processing Unit) cores, and achieved over 80-fold acceleration compared with mpiBLAST using the same computational resources. We applied this tool to human oral metagenomic data, and the results indicate that the oral cavity, the oral vestibule, and plaque have different characteristics based on the functional gene category.
format	Online Article Text
id	pubmed-5666806
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-56668062017-11-09 A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data Kakuta, Masanori Suzuki, Shuji Izawa, Kazuki Ishida, Takashi Akiyama, Yutaka Int J Mol Sci Article Sequence similarity searches have been widely used in the analyses of metagenomic sequencing data. Finding homologous sequences in a reference database enables the estimation of taxonomic and functional characteristics of each query sequence. Because current metagenomic sequencing data consist of a large number of nucleotide sequences, the time required for sequence similarity searches account for a large proportion of the total time. This time-consuming step makes it difficult to perform large-scale analyses. To analyze large-scale metagenomic data, such as those found in the human oral microbiome, we developed GHOST-MP (Genome-wide HOmology Search Tool on Massively Parallel system), a parallel sequence similarity search tool for massively parallel computing systems. This tool uses a fast search algorithm based on suffix arrays of query and database sequences and a hierarchical parallel search to accelerate the large-scale sequence similarity search of metagenomic sequencing data. The parallel computing efficiency and the search speed of this tool were evaluated. GHOST-MP was shown to be scalable over 10,000 CPU (Central Processing Unit) cores, and achieved over 80-fold acceleration compared with mpiBLAST using the same computational resources. We applied this tool to human oral metagenomic data, and the results indicate that the oral cavity, the oral vestibule, and plaque have different characteristics based on the functional gene category. MDPI 2017-10-11 /pmc/articles/PMC5666806/ /pubmed/29019934 http://dx.doi.org/10.3390/ijms18102124 Text en © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Kakuta, Masanori Suzuki, Shuji Izawa, Kazuki Ishida, Takashi Akiyama, Yutaka A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data
title	A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data
title_full	A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data
title_fullStr	A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data
title_full_unstemmed	A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data
title_short	A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data
title_sort	massively parallel sequence similarity search for metagenomic sequencing data
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5666806/ https://www.ncbi.nlm.nih.gov/pubmed/29019934 http://dx.doi.org/10.3390/ijms18102124
work_keys_str_mv	AT kakutamasanori amassivelyparallelsequencesimilaritysearchformetagenomicsequencingdata AT suzukishuji amassivelyparallelsequencesimilaritysearchformetagenomicsequencingdata AT izawakazuki amassivelyparallelsequencesimilaritysearchformetagenomicsequencingdata AT ishidatakashi amassivelyparallelsequencesimilaritysearchformetagenomicsequencingdata AT akiyamayutaka amassivelyparallelsequencesimilaritysearchformetagenomicsequencingdata AT kakutamasanori massivelyparallelsequencesimilaritysearchformetagenomicsequencingdata AT suzukishuji massivelyparallelsequencesimilaritysearchformetagenomicsequencingdata AT izawakazuki massivelyparallelsequencesimilaritysearchformetagenomicsequencingdata AT ishidatakashi massivelyparallelsequencesimilaritysearchformetagenomicsequencingdata AT akiyamayutaka massivelyparallelsequencesimilaritysearchformetagenomicsequencingdata

A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data

Ejemplares similares