Cargando…

Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data

With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype so...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kikuchi, Akihito, Ikemura, Toshimichi, Abe, Takashi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi Publishing Corporation 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4606171/ https://www.ncbi.nlm.nih.gov/pubmed/26495297 http://dx.doi.org/10.1155/2015/506052

_version_	1782395328701399040
author	Kikuchi, Akihito Ikemura, Toshimichi Abe, Takashi
author_facet	Kikuchi, Akihito Ikemura, Toshimichi Abe, Takashi
author_sort	Kikuchi, Akihito
collection	PubMed
description	With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype solely dependent on oligonucleotide composition and applied to genome and metagenomic studies. BLSOM is suitable for high-performance parallel-computing and can analyze big data simultaneously, but a large-scale BLSOM needs a large computational resource. We have developed Self-Compressing BLSOM (SC-BLSOM) for reduction of computation time, which allows us to carry out comprehensive analysis of big sequence data without the use of high-performance supercomputers. The strategy of SC-BLSOM is to hierarchically construct BLSOMs according to data class, such as phylotype. The first-layer BLSOM was constructed with each of the divided input data pieces that represents the data subclass, such as phylotype division, resulting in compression of the number of data pieces. The second BLSOM was constructed with a total of weight vectors obtained in the first-layer BLSOMs. We compared SC-BLSOM with the conventional BLSOM by analyzing bacterial genome sequences. SC-BLSOM could be constructed faster than BLSOM and cluster the sequences according to phylotype with high accuracy, showing the method's suitability for efficient knowledge discovery from big sequence data.
format	Online Article Text
id	pubmed-4606171
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Hindawi Publishing Corporation
record_format	MEDLINE/PubMed
spelling	pubmed-46061712015-10-22 Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data Kikuchi, Akihito Ikemura, Toshimichi Abe, Takashi Biomed Res Int Research Article With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype solely dependent on oligonucleotide composition and applied to genome and metagenomic studies. BLSOM is suitable for high-performance parallel-computing and can analyze big data simultaneously, but a large-scale BLSOM needs a large computational resource. We have developed Self-Compressing BLSOM (SC-BLSOM) for reduction of computation time, which allows us to carry out comprehensive analysis of big sequence data without the use of high-performance supercomputers. The strategy of SC-BLSOM is to hierarchically construct BLSOMs according to data class, such as phylotype. The first-layer BLSOM was constructed with each of the divided input data pieces that represents the data subclass, such as phylotype division, resulting in compression of the number of data pieces. The second BLSOM was constructed with a total of weight vectors obtained in the first-layer BLSOMs. We compared SC-BLSOM with the conventional BLSOM by analyzing bacterial genome sequences. SC-BLSOM could be constructed faster than BLSOM and cluster the sequences according to phylotype with high accuracy, showing the method's suitability for efficient knowledge discovery from big sequence data. Hindawi Publishing Corporation 2015 2015-10-01 /pmc/articles/PMC4606171/ /pubmed/26495297 http://dx.doi.org/10.1155/2015/506052 Text en Copyright © 2015 Akihito Kikuchi et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Kikuchi, Akihito Ikemura, Toshimichi Abe, Takashi Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data
title	Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data
title_full	Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data
title_fullStr	Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data
title_full_unstemmed	Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data
title_short	Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data
title_sort	development of self-compressing blsom for comprehensive analysis of big sequence data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4606171/ https://www.ncbi.nlm.nih.gov/pubmed/26495297 http://dx.doi.org/10.1155/2015/506052
work_keys_str_mv	AT kikuchiakihito developmentofselfcompressingblsomforcomprehensiveanalysisofbigsequencedata AT ikemuratoshimichi developmentofselfcompressingblsomforcomprehensiveanalysisofbigsequencedata AT abetakashi developmentofselfcompressingblsomforcomprehensiveanalysisofbigsequencedata

Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data

Ejemplares similares