Cargando…

Spatial constrains and information content of sub-genomic regions of the human genome

Complexity metrics and machine learning (ML) models have been utilized to analyze the lengths of segmental genomic entities of DNA sequences (exonic, intronic, intergenic, repeat, unique) with the purpose to ask questions regarding the segmental organization of the human genome within the size distr...

Descripción completa

Detalles Bibliográficos
Autores principales: Karakatsanis, Leonidas P., Pavlos, Evgenios G., Tsoulouhas, George, Stamokostas, Georgios L., Mosbruger, Timothy, Duke, Jamie L., Pavlos, George P., Monos, Dimitri S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7843455/
https://www.ncbi.nlm.nih.gov/pubmed/33554061
http://dx.doi.org/10.1016/j.isci.2021.102048
_version_ 1783644147714883584
author Karakatsanis, Leonidas P.
Pavlos, Evgenios G.
Tsoulouhas, George
Stamokostas, Georgios L.
Mosbruger, Timothy
Duke, Jamie L.
Pavlos, George P.
Monos, Dimitri S.
author_facet Karakatsanis, Leonidas P.
Pavlos, Evgenios G.
Tsoulouhas, George
Stamokostas, Georgios L.
Mosbruger, Timothy
Duke, Jamie L.
Pavlos, George P.
Monos, Dimitri S.
author_sort Karakatsanis, Leonidas P.
collection PubMed
description Complexity metrics and machine learning (ML) models have been utilized to analyze the lengths of segmental genomic entities of DNA sequences (exonic, intronic, intergenic, repeat, unique) with the purpose to ask questions regarding the segmental organization of the human genome within the size distribution of these sequences. For this we developed an integrated methodology that is based upon the reconstructed phase space theorem, the non-extensive statistical theory of Tsallis, ML techniques, and a technical index, integrating the generated information, which we introduce and named complexity factor (COFA). Our analysis revealed that the size distribution of the genomic regions within chromosomes are not random but follow patterns with characteristic features that have been seen through its complexity character, and it is part of the dynamics of the whole genome. Finally, this picture of dynamics in DNA is recognized using ML tools for clustering, classification, and prediction with high accuracy.
format Online
Article
Text
id pubmed-7843455
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-78434552021-02-04 Spatial constrains and information content of sub-genomic regions of the human genome Karakatsanis, Leonidas P. Pavlos, Evgenios G. Tsoulouhas, George Stamokostas, Georgios L. Mosbruger, Timothy Duke, Jamie L. Pavlos, George P. Monos, Dimitri S. iScience Article Complexity metrics and machine learning (ML) models have been utilized to analyze the lengths of segmental genomic entities of DNA sequences (exonic, intronic, intergenic, repeat, unique) with the purpose to ask questions regarding the segmental organization of the human genome within the size distribution of these sequences. For this we developed an integrated methodology that is based upon the reconstructed phase space theorem, the non-extensive statistical theory of Tsallis, ML techniques, and a technical index, integrating the generated information, which we introduce and named complexity factor (COFA). Our analysis revealed that the size distribution of the genomic regions within chromosomes are not random but follow patterns with characteristic features that have been seen through its complexity character, and it is part of the dynamics of the whole genome. Finally, this picture of dynamics in DNA is recognized using ML tools for clustering, classification, and prediction with high accuracy. Elsevier 2021-01-10 /pmc/articles/PMC7843455/ /pubmed/33554061 http://dx.doi.org/10.1016/j.isci.2021.102048 Text en © 2021 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Karakatsanis, Leonidas P.
Pavlos, Evgenios G.
Tsoulouhas, George
Stamokostas, Georgios L.
Mosbruger, Timothy
Duke, Jamie L.
Pavlos, George P.
Monos, Dimitri S.
Spatial constrains and information content of sub-genomic regions of the human genome
title Spatial constrains and information content of sub-genomic regions of the human genome
title_full Spatial constrains and information content of sub-genomic regions of the human genome
title_fullStr Spatial constrains and information content of sub-genomic regions of the human genome
title_full_unstemmed Spatial constrains and information content of sub-genomic regions of the human genome
title_short Spatial constrains and information content of sub-genomic regions of the human genome
title_sort spatial constrains and information content of sub-genomic regions of the human genome
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7843455/
https://www.ncbi.nlm.nih.gov/pubmed/33554061
http://dx.doi.org/10.1016/j.isci.2021.102048
work_keys_str_mv AT karakatsanisleonidasp spatialconstrainsandinformationcontentofsubgenomicregionsofthehumangenome
AT pavlosevgeniosg spatialconstrainsandinformationcontentofsubgenomicregionsofthehumangenome
AT tsoulouhasgeorge spatialconstrainsandinformationcontentofsubgenomicregionsofthehumangenome
AT stamokostasgeorgiosl spatialconstrainsandinformationcontentofsubgenomicregionsofthehumangenome
AT mosbrugertimothy spatialconstrainsandinformationcontentofsubgenomicregionsofthehumangenome
AT dukejamiel spatialconstrainsandinformationcontentofsubgenomicregionsofthehumangenome
AT pavlosgeorgep spatialconstrainsandinformationcontentofsubgenomicregionsofthehumangenome
AT monosdimitris spatialconstrainsandinformationcontentofsubgenomicregionsofthehumangenome