Cargando…

A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM)

With the remarkable increase of genomic sequence data of microorganisms, novel tools are needed for comprehensive analyses of the big sequence data available. The self-organizing map (SOM) is an effective tool for clustering and visualizing high-dimensional data, such as oligonucleotide composition...

Descripción completa

Detalles Bibliográficos
Autores principales: Iwasaki, Yuki, Abe, Takashi, Wada, Kennosuke, Wada, Yoshiko, Ikemura, Toshimichi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5029494/
https://www.ncbi.nlm.nih.gov/pubmed/27694768
http://dx.doi.org/10.3390/microorganisms1010137
_version_ 1782454524036775936
author Iwasaki, Yuki
Abe, Takashi
Wada, Kennosuke
Wada, Yoshiko
Ikemura, Toshimichi
author_facet Iwasaki, Yuki
Abe, Takashi
Wada, Kennosuke
Wada, Yoshiko
Ikemura, Toshimichi
author_sort Iwasaki, Yuki
collection PubMed
description With the remarkable increase of genomic sequence data of microorganisms, novel tools are needed for comprehensive analyses of the big sequence data available. The self-organizing map (SOM) is an effective tool for clustering and visualizing high-dimensional data, such as oligonucleotide composition on one map. By modifying the conventional SOM, we developed batch-learning SOM (BLSOM), which allowed classification of sequence fragments (e.g., 1 kb) according to phylotypes, solely depending on oligonucleotide composition. Metagenomics studies of uncultivable microorganisms in clinical and environmental samples should allow extensive surveys of genes important in life sciences. BLSOM is most suitable for phylogenetic assignment of metagenomic sequences, because fragmental sequences can be clustered according to phylotypes, solely depending on oligonucleotide composition. We first constructed oligonucleotide BLSOMs for all available sequences from genomes of known species, and by mapping metagenomic sequences on these large-scale BLSOMs, we can predict phylotypes of individual metagenomic sequences, revealing a microbial community structure of uncultured microorganisms, including viruses. BLSOM has shown that influenza viruses isolated from humans and birds clearly differ in oligonucleotide composition. Based on this host-dependent oligonucleotide composition, we have proposed strategies for predicting directional changes of virus sequences and for surveilling potentially hazardous strains when introduced into humans from non-human sources.
format Online
Article
Text
id pubmed-5029494
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-50294942016-09-28 A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM) Iwasaki, Yuki Abe, Takashi Wada, Kennosuke Wada, Yoshiko Ikemura, Toshimichi Microorganisms Review With the remarkable increase of genomic sequence data of microorganisms, novel tools are needed for comprehensive analyses of the big sequence data available. The self-organizing map (SOM) is an effective tool for clustering and visualizing high-dimensional data, such as oligonucleotide composition on one map. By modifying the conventional SOM, we developed batch-learning SOM (BLSOM), which allowed classification of sequence fragments (e.g., 1 kb) according to phylotypes, solely depending on oligonucleotide composition. Metagenomics studies of uncultivable microorganisms in clinical and environmental samples should allow extensive surveys of genes important in life sciences. BLSOM is most suitable for phylogenetic assignment of metagenomic sequences, because fragmental sequences can be clustered according to phylotypes, solely depending on oligonucleotide composition. We first constructed oligonucleotide BLSOMs for all available sequences from genomes of known species, and by mapping metagenomic sequences on these large-scale BLSOMs, we can predict phylotypes of individual metagenomic sequences, revealing a microbial community structure of uncultured microorganisms, including viruses. BLSOM has shown that influenza viruses isolated from humans and birds clearly differ in oligonucleotide composition. Based on this host-dependent oligonucleotide composition, we have proposed strategies for predicting directional changes of virus sequences and for surveilling potentially hazardous strains when introduced into humans from non-human sources. MDPI 2013-11-20 /pmc/articles/PMC5029494/ /pubmed/27694768 http://dx.doi.org/10.3390/microorganisms1010137 Text en © 2013 by the authors; licensee MDPI, Basel, Switzerland. http://creativecommons.org/licenses/by/3.0/ This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
spellingShingle Review
Iwasaki, Yuki
Abe, Takashi
Wada, Kennosuke
Wada, Yoshiko
Ikemura, Toshimichi
A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM)
title A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM)
title_full A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM)
title_fullStr A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM)
title_full_unstemmed A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM)
title_short A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM)
title_sort novel bioinformatics strategy to analyze microbial big sequence data for efficient knowledge discovery: batch-learning self-organizing map (blsom)
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5029494/
https://www.ncbi.nlm.nih.gov/pubmed/27694768
http://dx.doi.org/10.3390/microorganisms1010137
work_keys_str_mv AT iwasakiyuki anovelbioinformaticsstrategytoanalyzemicrobialbigsequencedataforefficientknowledgediscoverybatchlearningselforganizingmapblsom
AT abetakashi anovelbioinformaticsstrategytoanalyzemicrobialbigsequencedataforefficientknowledgediscoverybatchlearningselforganizingmapblsom
AT wadakennosuke anovelbioinformaticsstrategytoanalyzemicrobialbigsequencedataforefficientknowledgediscoverybatchlearningselforganizingmapblsom
AT wadayoshiko anovelbioinformaticsstrategytoanalyzemicrobialbigsequencedataforefficientknowledgediscoverybatchlearningselforganizingmapblsom
AT ikemuratoshimichi anovelbioinformaticsstrategytoanalyzemicrobialbigsequencedataforefficientknowledgediscoverybatchlearningselforganizingmapblsom
AT iwasakiyuki novelbioinformaticsstrategytoanalyzemicrobialbigsequencedataforefficientknowledgediscoverybatchlearningselforganizingmapblsom
AT abetakashi novelbioinformaticsstrategytoanalyzemicrobialbigsequencedataforefficientknowledgediscoverybatchlearningselforganizingmapblsom
AT wadakennosuke novelbioinformaticsstrategytoanalyzemicrobialbigsequencedataforefficientknowledgediscoverybatchlearningselforganizingmapblsom
AT wadayoshiko novelbioinformaticsstrategytoanalyzemicrobialbigsequencedataforefficientknowledgediscoverybatchlearningselforganizingmapblsom
AT ikemuratoshimichi novelbioinformaticsstrategytoanalyzemicrobialbigsequencedataforefficientknowledgediscoverybatchlearningselforganizingmapblsom