Cargando…

Structure, clustering and functional insights of repeats configurations in the upstream promoter region of the human coding genes

BACKGROUND: Repetitive DNA sequences (Repeats) are significant regions in the human genome that have a specific genomic distribution, structure, and several binding sites for genome architecture and function. In consequence, the possible configurations of Repeats in specific and dynamic regions like...

Descripción completa

Detalles Bibliográficos
Autores principales: Tobar-Tosse, Fabian, Veléz, Patricia E., Ocampo-Toro, Eliana, Moreno, Pedro A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6288848/
https://www.ncbi.nlm.nih.gov/pubmed/30537933
http://dx.doi.org/10.1186/s12864-018-5196-6
_version_ 1783379869412884480
author Tobar-Tosse, Fabian
Veléz, Patricia E.
Ocampo-Toro, Eliana
Moreno, Pedro A.
author_facet Tobar-Tosse, Fabian
Veléz, Patricia E.
Ocampo-Toro, Eliana
Moreno, Pedro A.
author_sort Tobar-Tosse, Fabian
collection PubMed
description BACKGROUND: Repetitive DNA sequences (Repeats) are significant regions in the human genome that have a specific genomic distribution, structure, and several binding sites for genome architecture and function. In consequence, the possible configurations of Repeats in specific and dynamic regions like the gene promoters could define footprints for molecular mechanisms, pathways, and cell function beyond their density in the genome. Here we explored the distribution of Repeats in the upstream promoter region of the human coding genes with the aim to identify specific configurations, clusters and functional meaning of those elements. Our method includes structural descriptions, hierarchical clustering, pathway association, and functional enrichment analysis. RESULTS: We report here several configurations of Repeats in the upstream promoter region (UPR), which define 2729 patterns for the 80% of the human coding genes. There are 47 types of Repeats in these configurations, where the most frequent were Alu, Low_complexity, MIR, Simple_repeat, LINE/L2, LINE/L1, hAT-Charlie, and ERV1. The distribution, length, and the high frequency of Repeats in the UPR defines several patterns and clusters, where the minimum frequency of configuration among Repeats was higher than 0.7. We found those clusters associated with cellular pathways and ontologies; thus, it was plausible to determine groups of Repeats to specific functional insights, for example, pathways for Genetic Information Processing or Metabolism shows particular groups of Repeats with specific configurations. CONCLUSION: Based on these findings, we propose that specific configurations of repetitive elements describe frequent patterns in the upstream promoter for sets of human coding genes, which those correlated to specific and essential cell pathways and functions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-5196-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6288848
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62888482018-12-14 Structure, clustering and functional insights of repeats configurations in the upstream promoter region of the human coding genes Tobar-Tosse, Fabian Veléz, Patricia E. Ocampo-Toro, Eliana Moreno, Pedro A. BMC Genomics Research BACKGROUND: Repetitive DNA sequences (Repeats) are significant regions in the human genome that have a specific genomic distribution, structure, and several binding sites for genome architecture and function. In consequence, the possible configurations of Repeats in specific and dynamic regions like the gene promoters could define footprints for molecular mechanisms, pathways, and cell function beyond their density in the genome. Here we explored the distribution of Repeats in the upstream promoter region of the human coding genes with the aim to identify specific configurations, clusters and functional meaning of those elements. Our method includes structural descriptions, hierarchical clustering, pathway association, and functional enrichment analysis. RESULTS: We report here several configurations of Repeats in the upstream promoter region (UPR), which define 2729 patterns for the 80% of the human coding genes. There are 47 types of Repeats in these configurations, where the most frequent were Alu, Low_complexity, MIR, Simple_repeat, LINE/L2, LINE/L1, hAT-Charlie, and ERV1. The distribution, length, and the high frequency of Repeats in the UPR defines several patterns and clusters, where the minimum frequency of configuration among Repeats was higher than 0.7. We found those clusters associated with cellular pathways and ontologies; thus, it was plausible to determine groups of Repeats to specific functional insights, for example, pathways for Genetic Information Processing or Metabolism shows particular groups of Repeats with specific configurations. CONCLUSION: Based on these findings, we propose that specific configurations of repetitive elements describe frequent patterns in the upstream promoter for sets of human coding genes, which those correlated to specific and essential cell pathways and functions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-5196-6) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-11 /pmc/articles/PMC6288848/ /pubmed/30537933 http://dx.doi.org/10.1186/s12864-018-5196-6 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Tobar-Tosse, Fabian
Veléz, Patricia E.
Ocampo-Toro, Eliana
Moreno, Pedro A.
Structure, clustering and functional insights of repeats configurations in the upstream promoter region of the human coding genes
title Structure, clustering and functional insights of repeats configurations in the upstream promoter region of the human coding genes
title_full Structure, clustering and functional insights of repeats configurations in the upstream promoter region of the human coding genes
title_fullStr Structure, clustering and functional insights of repeats configurations in the upstream promoter region of the human coding genes
title_full_unstemmed Structure, clustering and functional insights of repeats configurations in the upstream promoter region of the human coding genes
title_short Structure, clustering and functional insights of repeats configurations in the upstream promoter region of the human coding genes
title_sort structure, clustering and functional insights of repeats configurations in the upstream promoter region of the human coding genes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6288848/
https://www.ncbi.nlm.nih.gov/pubmed/30537933
http://dx.doi.org/10.1186/s12864-018-5196-6
work_keys_str_mv AT tobartossefabian structureclusteringandfunctionalinsightsofrepeatsconfigurationsintheupstreampromoterregionofthehumancodinggenes
AT velezpatriciae structureclusteringandfunctionalinsightsofrepeatsconfigurationsintheupstreampromoterregionofthehumancodinggenes
AT ocampotoroeliana structureclusteringandfunctionalinsightsofrepeatsconfigurationsintheupstreampromoterregionofthehumancodinggenes
AT morenopedroa structureclusteringandfunctionalinsightsofrepeatsconfigurationsintheupstreampromoterregionofthehumancodinggenes