Cargando…

Revealing the Presence of a Symbolic Sequence Representing Multiple Nucleotides Based on K-Means Clustering of Oligonucleotides

In biological systems, a few sequence differences diversify the hybridization profile of nucleotides and enable the quantitative control of cellular metabolism in a cooperative manner. In this respect, the information required for a better understanding may not be in each nucleotide sequence, but re...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Byoungsang, Ahn, So Yeon, Park, Charles, Moon, James J., Lee, Jung Heon, Luo, Dan, Um, Soong Ho, Shin, Seung Won
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6359743/
https://www.ncbi.nlm.nih.gov/pubmed/30669407
http://dx.doi.org/10.3390/molecules24020348
_version_ 1783392336590405632
author Lee, Byoungsang
Ahn, So Yeon
Park, Charles
Moon, James J.
Lee, Jung Heon
Luo, Dan
Um, Soong Ho
Shin, Seung Won
author_facet Lee, Byoungsang
Ahn, So Yeon
Park, Charles
Moon, James J.
Lee, Jung Heon
Luo, Dan
Um, Soong Ho
Shin, Seung Won
author_sort Lee, Byoungsang
collection PubMed
description In biological systems, a few sequence differences diversify the hybridization profile of nucleotides and enable the quantitative control of cellular metabolism in a cooperative manner. In this respect, the information required for a better understanding may not be in each nucleotide sequence, but representative information contained among them. Existing methodologies for nucleotide sequence design have been optimized to track the function of the genetic molecule and predict interaction with others. However, there has been no attempt to extract new sequence information to represent their inheritance function. Here, we tried to conceptually reveal the presence of a representative sequence from groups of nucleotides. The combined application of the K-means clustering algorithm and the social network analysis theorem enabled the effective calculation of the representative sequence. First, a “common sequence” is made that has the highest hybridization property to analog sequences. Next, the sequence complementary to the common sequence is designated as a ‘representative sequence’. Based on this, we obtained a representative sequence from multiple analog sequences that are 8–10-bases long. Their hybridization was empirically tested, which confirmed that the common sequence had the highest hybridization tendency, and the representative sequence better alignment with the analogs compared to a mere complementary.
format Online
Article
Text
id pubmed-6359743
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-63597432019-02-06 Revealing the Presence of a Symbolic Sequence Representing Multiple Nucleotides Based on K-Means Clustering of Oligonucleotides Lee, Byoungsang Ahn, So Yeon Park, Charles Moon, James J. Lee, Jung Heon Luo, Dan Um, Soong Ho Shin, Seung Won Molecules Article In biological systems, a few sequence differences diversify the hybridization profile of nucleotides and enable the quantitative control of cellular metabolism in a cooperative manner. In this respect, the information required for a better understanding may not be in each nucleotide sequence, but representative information contained among them. Existing methodologies for nucleotide sequence design have been optimized to track the function of the genetic molecule and predict interaction with others. However, there has been no attempt to extract new sequence information to represent their inheritance function. Here, we tried to conceptually reveal the presence of a representative sequence from groups of nucleotides. The combined application of the K-means clustering algorithm and the social network analysis theorem enabled the effective calculation of the representative sequence. First, a “common sequence” is made that has the highest hybridization property to analog sequences. Next, the sequence complementary to the common sequence is designated as a ‘representative sequence’. Based on this, we obtained a representative sequence from multiple analog sequences that are 8–10-bases long. Their hybridization was empirically tested, which confirmed that the common sequence had the highest hybridization tendency, and the representative sequence better alignment with the analogs compared to a mere complementary. MDPI 2019-01-18 /pmc/articles/PMC6359743/ /pubmed/30669407 http://dx.doi.org/10.3390/molecules24020348 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Lee, Byoungsang
Ahn, So Yeon
Park, Charles
Moon, James J.
Lee, Jung Heon
Luo, Dan
Um, Soong Ho
Shin, Seung Won
Revealing the Presence of a Symbolic Sequence Representing Multiple Nucleotides Based on K-Means Clustering of Oligonucleotides
title Revealing the Presence of a Symbolic Sequence Representing Multiple Nucleotides Based on K-Means Clustering of Oligonucleotides
title_full Revealing the Presence of a Symbolic Sequence Representing Multiple Nucleotides Based on K-Means Clustering of Oligonucleotides
title_fullStr Revealing the Presence of a Symbolic Sequence Representing Multiple Nucleotides Based on K-Means Clustering of Oligonucleotides
title_full_unstemmed Revealing the Presence of a Symbolic Sequence Representing Multiple Nucleotides Based on K-Means Clustering of Oligonucleotides
title_short Revealing the Presence of a Symbolic Sequence Representing Multiple Nucleotides Based on K-Means Clustering of Oligonucleotides
title_sort revealing the presence of a symbolic sequence representing multiple nucleotides based on k-means clustering of oligonucleotides
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6359743/
https://www.ncbi.nlm.nih.gov/pubmed/30669407
http://dx.doi.org/10.3390/molecules24020348
work_keys_str_mv AT leebyoungsang revealingthepresenceofasymbolicsequencerepresentingmultiplenucleotidesbasedonkmeansclusteringofoligonucleotides
AT ahnsoyeon revealingthepresenceofasymbolicsequencerepresentingmultiplenucleotidesbasedonkmeansclusteringofoligonucleotides
AT parkcharles revealingthepresenceofasymbolicsequencerepresentingmultiplenucleotidesbasedonkmeansclusteringofoligonucleotides
AT moonjamesj revealingthepresenceofasymbolicsequencerepresentingmultiplenucleotidesbasedonkmeansclusteringofoligonucleotides
AT leejungheon revealingthepresenceofasymbolicsequencerepresentingmultiplenucleotidesbasedonkmeansclusteringofoligonucleotides
AT luodan revealingthepresenceofasymbolicsequencerepresentingmultiplenucleotidesbasedonkmeansclusteringofoligonucleotides
AT umsoongho revealingthepresenceofasymbolicsequencerepresentingmultiplenucleotidesbasedonkmeansclusteringofoligonucleotides
AT shinseungwon revealingthepresenceofasymbolicsequencerepresentingmultiplenucleotidesbasedonkmeansclusteringofoligonucleotides