Cargando…

Structural and Functional Classification of G-Quadruplex Families within the Human Genome

G-quadruplexes (G4s) are short secondary DNA structures located throughout genomic DNA and transcribed RNA. Although G4 structures have been shown to form in vivo, no current search tools that examine these structures based on previously identified G-quadruplexes and filter them based on similar seq...

Descripción completa

Detalles Bibliográficos
Autores principales: Neupane, Aryan, Chariker, Julia H., Rouchka, Eric C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10048163/
https://www.ncbi.nlm.nih.gov/pubmed/36980918
http://dx.doi.org/10.3390/genes14030645
_version_ 1785014111490277376
author Neupane, Aryan
Chariker, Julia H.
Rouchka, Eric C.
author_facet Neupane, Aryan
Chariker, Julia H.
Rouchka, Eric C.
author_sort Neupane, Aryan
collection PubMed
description G-quadruplexes (G4s) are short secondary DNA structures located throughout genomic DNA and transcribed RNA. Although G4 structures have been shown to form in vivo, no current search tools that examine these structures based on previously identified G-quadruplexes and filter them based on similar sequence, structure, and thermodynamic properties are known to exist. We present a framework for clustering G-quadruplex sequences into families using the CD-HIT, MeShClust, and DNACLUST methods along with a combination of Starcode and BLAST. Utilizing this framework to filter and annotate clusters, 95 families of G-quadruplex sequences were identified within the human genome. Profiles for each family were created using hidden Markov models to allow for the identification of additional family members and generate homology probability scores. The thermodynamic folding energy properties, functional annotation of genes associated with the sequences, scores from different prediction algorithms, and transcription factor binding motifs within a family were used to annotate and compare the diversity within and across clusters. The resulting set of G-quadruplex families can be used to further understand how different regions of the genome are regulated by factors targeting specific structures common to members of a specific cluster.
format Online
Article
Text
id pubmed-10048163
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100481632023-03-29 Structural and Functional Classification of G-Quadruplex Families within the Human Genome Neupane, Aryan Chariker, Julia H. Rouchka, Eric C. Genes (Basel) Article G-quadruplexes (G4s) are short secondary DNA structures located throughout genomic DNA and transcribed RNA. Although G4 structures have been shown to form in vivo, no current search tools that examine these structures based on previously identified G-quadruplexes and filter them based on similar sequence, structure, and thermodynamic properties are known to exist. We present a framework for clustering G-quadruplex sequences into families using the CD-HIT, MeShClust, and DNACLUST methods along with a combination of Starcode and BLAST. Utilizing this framework to filter and annotate clusters, 95 families of G-quadruplex sequences were identified within the human genome. Profiles for each family were created using hidden Markov models to allow for the identification of additional family members and generate homology probability scores. The thermodynamic folding energy properties, functional annotation of genes associated with the sequences, scores from different prediction algorithms, and transcription factor binding motifs within a family were used to annotate and compare the diversity within and across clusters. The resulting set of G-quadruplex families can be used to further understand how different regions of the genome are regulated by factors targeting specific structures common to members of a specific cluster. MDPI 2023-03-04 /pmc/articles/PMC10048163/ /pubmed/36980918 http://dx.doi.org/10.3390/genes14030645 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Neupane, Aryan
Chariker, Julia H.
Rouchka, Eric C.
Structural and Functional Classification of G-Quadruplex Families within the Human Genome
title Structural and Functional Classification of G-Quadruplex Families within the Human Genome
title_full Structural and Functional Classification of G-Quadruplex Families within the Human Genome
title_fullStr Structural and Functional Classification of G-Quadruplex Families within the Human Genome
title_full_unstemmed Structural and Functional Classification of G-Quadruplex Families within the Human Genome
title_short Structural and Functional Classification of G-Quadruplex Families within the Human Genome
title_sort structural and functional classification of g-quadruplex families within the human genome
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10048163/
https://www.ncbi.nlm.nih.gov/pubmed/36980918
http://dx.doi.org/10.3390/genes14030645
work_keys_str_mv AT neupanearyan structuralandfunctionalclassificationofgquadruplexfamilieswithinthehumangenome
AT charikerjuliah structuralandfunctionalclassificationofgquadruplexfamilieswithinthehumangenome
AT rouchkaericc structuralandfunctionalclassificationofgquadruplexfamilieswithinthehumangenome