Cargando…

Use of 6 Nucleotide Length Words to Study the Complexity of Gene Sequences from Different Organisms

In this paper, we attempted to find a relation between bacteria living conditions and their genome algorithmic complexity. We developed a probabilistic mathematical method for the evaluation of k-words (6 bases length) occurrence irregularity in bacterial gene coding sequences. For this, the coding...

Descripción completa

Detalles Bibliográficos
Autores principales: Korotkov, Eugene, Zaytsev, Konstantin, Fedorov, Alexey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9141341/
https://www.ncbi.nlm.nih.gov/pubmed/35626518
http://dx.doi.org/10.3390/e24050632
_version_ 1784715322125713408
author Korotkov, Eugene
Zaytsev, Konstantin
Fedorov, Alexey
author_facet Korotkov, Eugene
Zaytsev, Konstantin
Fedorov, Alexey
author_sort Korotkov, Eugene
collection PubMed
description In this paper, we attempted to find a relation between bacteria living conditions and their genome algorithmic complexity. We developed a probabilistic mathematical method for the evaluation of k-words (6 bases length) occurrence irregularity in bacterial gene coding sequences. For this, the coding sequences from different bacterial genomes were analyzed and as an index of k-words occurrence irregularity, we used [Formula: see text] , which has a distribution similar to normal. The research results for bacterial genomes show that they can be divided into two uneven groups. First, the smaller one has W in the interval from 170 to 475, while for the second it is from 475 to 875. Plants, metazoan and virus genomes also have W in the same interval as the first bacterial group. We suggested that second bacterial group coding sequences are much less susceptible to evolutionary changes than the first group ones. It is also discussed to use the W index as a biological stress value.
format Online
Article
Text
id pubmed-9141341
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-91413412022-05-28 Use of 6 Nucleotide Length Words to Study the Complexity of Gene Sequences from Different Organisms Korotkov, Eugene Zaytsev, Konstantin Fedorov, Alexey Entropy (Basel) Article In this paper, we attempted to find a relation between bacteria living conditions and their genome algorithmic complexity. We developed a probabilistic mathematical method for the evaluation of k-words (6 bases length) occurrence irregularity in bacterial gene coding sequences. For this, the coding sequences from different bacterial genomes were analyzed and as an index of k-words occurrence irregularity, we used [Formula: see text] , which has a distribution similar to normal. The research results for bacterial genomes show that they can be divided into two uneven groups. First, the smaller one has W in the interval from 170 to 475, while for the second it is from 475 to 875. Plants, metazoan and virus genomes also have W in the same interval as the first bacterial group. We suggested that second bacterial group coding sequences are much less susceptible to evolutionary changes than the first group ones. It is also discussed to use the W index as a biological stress value. MDPI 2022-04-30 /pmc/articles/PMC9141341/ /pubmed/35626518 http://dx.doi.org/10.3390/e24050632 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Korotkov, Eugene
Zaytsev, Konstantin
Fedorov, Alexey
Use of 6 Nucleotide Length Words to Study the Complexity of Gene Sequences from Different Organisms
title Use of 6 Nucleotide Length Words to Study the Complexity of Gene Sequences from Different Organisms
title_full Use of 6 Nucleotide Length Words to Study the Complexity of Gene Sequences from Different Organisms
title_fullStr Use of 6 Nucleotide Length Words to Study the Complexity of Gene Sequences from Different Organisms
title_full_unstemmed Use of 6 Nucleotide Length Words to Study the Complexity of Gene Sequences from Different Organisms
title_short Use of 6 Nucleotide Length Words to Study the Complexity of Gene Sequences from Different Organisms
title_sort use of 6 nucleotide length words to study the complexity of gene sequences from different organisms
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9141341/
https://www.ncbi.nlm.nih.gov/pubmed/35626518
http://dx.doi.org/10.3390/e24050632
work_keys_str_mv AT korotkoveugene useof6nucleotidelengthwordstostudythecomplexityofgenesequencesfromdifferentorganisms
AT zaytsevkonstantin useof6nucleotidelengthwordstostudythecomplexityofgenesequencesfromdifferentorganisms
AT fedorovalexey useof6nucleotidelengthwordstostudythecomplexityofgenesequencesfromdifferentorganisms