Cargando…
Structural and Functional Classification of G-Quadruplex Families within the Human Genome
G-quadruplexes (G4s) are short secondary DNA structures located throughout genomic DNA and transcribed RNA. Although G4 structures have been shown to form in vivo, no current search tools that examine these structures based on previously identified G-quadruplexes and filter them based on similar seq...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10048163/ https://www.ncbi.nlm.nih.gov/pubmed/36980918 http://dx.doi.org/10.3390/genes14030645 |
_version_ | 1785014111490277376 |
---|---|
author | Neupane, Aryan Chariker, Julia H. Rouchka, Eric C. |
author_facet | Neupane, Aryan Chariker, Julia H. Rouchka, Eric C. |
author_sort | Neupane, Aryan |
collection | PubMed |
description | G-quadruplexes (G4s) are short secondary DNA structures located throughout genomic DNA and transcribed RNA. Although G4 structures have been shown to form in vivo, no current search tools that examine these structures based on previously identified G-quadruplexes and filter them based on similar sequence, structure, and thermodynamic properties are known to exist. We present a framework for clustering G-quadruplex sequences into families using the CD-HIT, MeShClust, and DNACLUST methods along with a combination of Starcode and BLAST. Utilizing this framework to filter and annotate clusters, 95 families of G-quadruplex sequences were identified within the human genome. Profiles for each family were created using hidden Markov models to allow for the identification of additional family members and generate homology probability scores. The thermodynamic folding energy properties, functional annotation of genes associated with the sequences, scores from different prediction algorithms, and transcription factor binding motifs within a family were used to annotate and compare the diversity within and across clusters. The resulting set of G-quadruplex families can be used to further understand how different regions of the genome are regulated by factors targeting specific structures common to members of a specific cluster. |
format | Online Article Text |
id | pubmed-10048163 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-100481632023-03-29 Structural and Functional Classification of G-Quadruplex Families within the Human Genome Neupane, Aryan Chariker, Julia H. Rouchka, Eric C. Genes (Basel) Article G-quadruplexes (G4s) are short secondary DNA structures located throughout genomic DNA and transcribed RNA. Although G4 structures have been shown to form in vivo, no current search tools that examine these structures based on previously identified G-quadruplexes and filter them based on similar sequence, structure, and thermodynamic properties are known to exist. We present a framework for clustering G-quadruplex sequences into families using the CD-HIT, MeShClust, and DNACLUST methods along with a combination of Starcode and BLAST. Utilizing this framework to filter and annotate clusters, 95 families of G-quadruplex sequences were identified within the human genome. Profiles for each family were created using hidden Markov models to allow for the identification of additional family members and generate homology probability scores. The thermodynamic folding energy properties, functional annotation of genes associated with the sequences, scores from different prediction algorithms, and transcription factor binding motifs within a family were used to annotate and compare the diversity within and across clusters. The resulting set of G-quadruplex families can be used to further understand how different regions of the genome are regulated by factors targeting specific structures common to members of a specific cluster. MDPI 2023-03-04 /pmc/articles/PMC10048163/ /pubmed/36980918 http://dx.doi.org/10.3390/genes14030645 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Neupane, Aryan Chariker, Julia H. Rouchka, Eric C. Structural and Functional Classification of G-Quadruplex Families within the Human Genome |
title | Structural and Functional Classification of G-Quadruplex Families within the Human Genome |
title_full | Structural and Functional Classification of G-Quadruplex Families within the Human Genome |
title_fullStr | Structural and Functional Classification of G-Quadruplex Families within the Human Genome |
title_full_unstemmed | Structural and Functional Classification of G-Quadruplex Families within the Human Genome |
title_short | Structural and Functional Classification of G-Quadruplex Families within the Human Genome |
title_sort | structural and functional classification of g-quadruplex families within the human genome |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10048163/ https://www.ncbi.nlm.nih.gov/pubmed/36980918 http://dx.doi.org/10.3390/genes14030645 |
work_keys_str_mv | AT neupanearyan structuralandfunctionalclassificationofgquadruplexfamilieswithinthehumangenome AT charikerjuliah structuralandfunctionalclassificationofgquadruplexfamilieswithinthehumangenome AT rouchkaericc structuralandfunctionalclassificationofgquadruplexfamilieswithinthehumangenome |