Cargando…

Conservation of k-mer Composition and Correlation Contribution between Introns and Intergenic Regions of Animalia Genomes

In this study, we pairwise-compared multiple genome regions, including genes, exons, coding DNA sequences (CDS), introns, and intergenic regions of 39 Animalia genomes, including Deuterostomia (27 species) and Protostomia (12 species), by applying established k-mer-based (alignment-free) comparison...

Descripción completa

Detalles Bibliográficos
Autores principales: Sievers, Aaron, Wenz, Frederik, Hausmann, Michael, Hildenbrand, Georg
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6211125/
https://www.ncbi.nlm.nih.gov/pubmed/30287792
http://dx.doi.org/10.3390/genes9100482
Descripción
Sumario:In this study, we pairwise-compared multiple genome regions, including genes, exons, coding DNA sequences (CDS), introns, and intergenic regions of 39 Animalia genomes, including Deuterostomia (27 species) and Protostomia (12 species), by applying established k-mer-based (alignment-free) comparison methods. We found strong correlations between the sequence structure of introns and intergenic regions, individual organisms, and within wider phylogenetical ranges, indicating the conservation of certain structures over the full range of analyzed organisms. We analyzed these sequence structures by quantifying the contribution of different sets of DNA words to the average correlation value by decomposing the correlation coefficients with respect to these word sets. We found that the conserved structures within introns, intergenic regions, and between the two were mainly a result of conserved tandem repeats with repeat units ≤ 2 bp (e.g., (AT)(n)), while other conserved sequence structures, such as those found between exons and CDS, were dominated by tandem repeats with repeat unit sizes of 3 bp in length and more complex DNA word patterns. We conclude that the conservation between intron and intergenic regions indicates a shared function of these sequence structures. Also, the similar differences in conserved structures with known origin, especially to the conservation between exons and CDS resulting from DNA codons, indicate that k-mer composition-based functional properties of introns and intergenic regions may differ from those of exons and CDS.