Cargando…

A Novel Bioinformatics Method for Efficient Knowledge Discovery by BLSOM from Big Genomic Sequence Data

With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map...

Descripción completa

Detalles Bibliográficos
Autores principales: Bai, Yu, Iwasaki, Yuki, Kanaya, Shigehiko, Zhao, Yue, Ikemura, Toshimichi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3996302/
https://www.ncbi.nlm.nih.gov/pubmed/24804244
http://dx.doi.org/10.1155/2014/765648
_version_ 1782313024235765760
author Bai, Yu
Iwasaki, Yuki
Kanaya, Shigehiko
Zhao, Yue
Ikemura, Toshimichi
author_facet Bai, Yu
Iwasaki, Yuki
Kanaya, Shigehiko
Zhao, Yue
Ikemura, Toshimichi
author_sort Bai, Yu
collection PubMed
description With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a “genome signature,” and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data).
format Online
Article
Text
id pubmed-3996302
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-39963022014-05-06 A Novel Bioinformatics Method for Efficient Knowledge Discovery by BLSOM from Big Genomic Sequence Data Bai, Yu Iwasaki, Yuki Kanaya, Shigehiko Zhao, Yue Ikemura, Toshimichi Biomed Res Int Research Article With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a “genome signature,” and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data). Hindawi Publishing Corporation 2014 2014-04-03 /pmc/articles/PMC3996302/ /pubmed/24804244 http://dx.doi.org/10.1155/2014/765648 Text en Copyright © 2014 Yu Bai et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Bai, Yu
Iwasaki, Yuki
Kanaya, Shigehiko
Zhao, Yue
Ikemura, Toshimichi
A Novel Bioinformatics Method for Efficient Knowledge Discovery by BLSOM from Big Genomic Sequence Data
title A Novel Bioinformatics Method for Efficient Knowledge Discovery by BLSOM from Big Genomic Sequence Data
title_full A Novel Bioinformatics Method for Efficient Knowledge Discovery by BLSOM from Big Genomic Sequence Data
title_fullStr A Novel Bioinformatics Method for Efficient Knowledge Discovery by BLSOM from Big Genomic Sequence Data
title_full_unstemmed A Novel Bioinformatics Method for Efficient Knowledge Discovery by BLSOM from Big Genomic Sequence Data
title_short A Novel Bioinformatics Method for Efficient Knowledge Discovery by BLSOM from Big Genomic Sequence Data
title_sort novel bioinformatics method for efficient knowledge discovery by blsom from big genomic sequence data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3996302/
https://www.ncbi.nlm.nih.gov/pubmed/24804244
http://dx.doi.org/10.1155/2014/765648
work_keys_str_mv AT baiyu anovelbioinformaticsmethodforefficientknowledgediscoverybyblsomfrombiggenomicsequencedata
AT iwasakiyuki anovelbioinformaticsmethodforefficientknowledgediscoverybyblsomfrombiggenomicsequencedata
AT kanayashigehiko anovelbioinformaticsmethodforefficientknowledgediscoverybyblsomfrombiggenomicsequencedata
AT zhaoyue anovelbioinformaticsmethodforefficientknowledgediscoverybyblsomfrombiggenomicsequencedata
AT ikemuratoshimichi anovelbioinformaticsmethodforefficientknowledgediscoverybyblsomfrombiggenomicsequencedata
AT baiyu novelbioinformaticsmethodforefficientknowledgediscoverybyblsomfrombiggenomicsequencedata
AT iwasakiyuki novelbioinformaticsmethodforefficientknowledgediscoverybyblsomfrombiggenomicsequencedata
AT kanayashigehiko novelbioinformaticsmethodforefficientknowledgediscoverybyblsomfrombiggenomicsequencedata
AT zhaoyue novelbioinformaticsmethodforefficientknowledgediscoverybyblsomfrombiggenomicsequencedata
AT ikemuratoshimichi novelbioinformaticsmethodforefficientknowledgediscoverybyblsomfrombiggenomicsequencedata