Cargando…

Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project

In the human genome, it has been estimated that considerably more sequence is under natural selection in non-coding regions [such as transcription-factor binding sites (TF-binding sites) and non-coding RNAs (ncRNAs)] compared to protein-coding ones. However, less attention has been paid to them. To...

Descripción completa

Detalles Bibliográficos
Autores principales: Mu, Xinmeng Jasmine, Lu, Zhi John, Kong, Yong, Lam, Hugo Y. K., Gerstein, Mark B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3167619/
https://www.ncbi.nlm.nih.gov/pubmed/21596777
http://dx.doi.org/10.1093/nar/gkr342
_version_ 1782211265179942912
author Mu, Xinmeng Jasmine
Lu, Zhi John
Kong, Yong
Lam, Hugo Y. K.
Gerstein, Mark B.
author_facet Mu, Xinmeng Jasmine
Lu, Zhi John
Kong, Yong
Lam, Hugo Y. K.
Gerstein, Mark B.
author_sort Mu, Xinmeng Jasmine
collection PubMed
description In the human genome, it has been estimated that considerably more sequence is under natural selection in non-coding regions [such as transcription-factor binding sites (TF-binding sites) and non-coding RNAs (ncRNAs)] compared to protein-coding ones. However, less attention has been paid to them. To study selective pressure on non-coding elements, we use next-generation sequencing data from the recently completed pilot phase of the 1000 Genomes Project, which, compared to traditional methods, allows for the characterization of a full spectrum of genomic variations, including single-nucleotide polymorphisms (SNPs), short insertions and deletions (indels) and structural variations (SVs). We develop a framework for combining these variation data with non-coding elements, calculating various population-based metrics to compare classes and subclasses of elements, and developing element-aware aggregation procedures to probe the internal structure of an element. Overall, we find that TF-binding sites and ncRNAs are less selectively constrained for SNPs than coding sequences (CDSs), but more constrained than a neutral reference. We also determine that the relative amounts of constraint for the three types of variations are, in general, correlated, but there are some differences: counter-intuitively, TF-binding sites and ncRNAs are more selectively constrained for indels than for SNPs, compared to CDSs. After inspecting the overall properties of a class of elements, we analyze selective pressure on subclasses within an element class, and show that the extent of selection is associated with the genomic properties of each subclass. We find, for instance, that ncRNAs with higher expression levels tend to be under stronger purifying selection, and the actual regions of TF-binding motifs are under stronger selective pressure than the corresponding peak regions. Further, we develop element-aware aggregation plots to analyze selective pressure across the linear structure of an element, with the confidence intervals evaluated using both simple bootstrapping and block bootstrapping techniques. We find, for example, that both micro-RNAs (particularly the seed regions) and their binding targets are under stronger selective pressure for SNPs than their immediate genomic surroundings. In addition, we demonstrate that substitutions in TF-binding motifs inversely correlate with site conservation, and SNPs unfavorable for motifs are under more selective constraints than favorable SNPs. Finally, to further investigate intra-element differences, we show that SVs have the tendency to use distinctive modes and mechanisms when they interact with genomic elements, such as enveloping whole gene(s) rather than disrupting them partially, as well as duplicating TF motifs in tandem.
format Online
Article
Text
id pubmed-3167619
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-31676192011-09-06 Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project Mu, Xinmeng Jasmine Lu, Zhi John Kong, Yong Lam, Hugo Y. K. Gerstein, Mark B. Nucleic Acids Res Genomics In the human genome, it has been estimated that considerably more sequence is under natural selection in non-coding regions [such as transcription-factor binding sites (TF-binding sites) and non-coding RNAs (ncRNAs)] compared to protein-coding ones. However, less attention has been paid to them. To study selective pressure on non-coding elements, we use next-generation sequencing data from the recently completed pilot phase of the 1000 Genomes Project, which, compared to traditional methods, allows for the characterization of a full spectrum of genomic variations, including single-nucleotide polymorphisms (SNPs), short insertions and deletions (indels) and structural variations (SVs). We develop a framework for combining these variation data with non-coding elements, calculating various population-based metrics to compare classes and subclasses of elements, and developing element-aware aggregation procedures to probe the internal structure of an element. Overall, we find that TF-binding sites and ncRNAs are less selectively constrained for SNPs than coding sequences (CDSs), but more constrained than a neutral reference. We also determine that the relative amounts of constraint for the three types of variations are, in general, correlated, but there are some differences: counter-intuitively, TF-binding sites and ncRNAs are more selectively constrained for indels than for SNPs, compared to CDSs. After inspecting the overall properties of a class of elements, we analyze selective pressure on subclasses within an element class, and show that the extent of selection is associated with the genomic properties of each subclass. We find, for instance, that ncRNAs with higher expression levels tend to be under stronger purifying selection, and the actual regions of TF-binding motifs are under stronger selective pressure than the corresponding peak regions. Further, we develop element-aware aggregation plots to analyze selective pressure across the linear structure of an element, with the confidence intervals evaluated using both simple bootstrapping and block bootstrapping techniques. We find, for example, that both micro-RNAs (particularly the seed regions) and their binding targets are under stronger selective pressure for SNPs than their immediate genomic surroundings. In addition, we demonstrate that substitutions in TF-binding motifs inversely correlate with site conservation, and SNPs unfavorable for motifs are under more selective constraints than favorable SNPs. Finally, to further investigate intra-element differences, we show that SVs have the tendency to use distinctive modes and mechanisms when they interact with genomic elements, such as enveloping whole gene(s) rather than disrupting them partially, as well as duplicating TF motifs in tandem. Oxford University Press 2011-09 2011-05-19 /pmc/articles/PMC3167619/ /pubmed/21596777 http://dx.doi.org/10.1093/nar/gkr342 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Genomics
Mu, Xinmeng Jasmine
Lu, Zhi John
Kong, Yong
Lam, Hugo Y. K.
Gerstein, Mark B.
Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project
title Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project
title_full Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project
title_fullStr Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project
title_full_unstemmed Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project
title_short Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project
title_sort analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 genomes project
topic Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3167619/
https://www.ncbi.nlm.nih.gov/pubmed/21596777
http://dx.doi.org/10.1093/nar/gkr342
work_keys_str_mv AT muxinmengjasmine analysisofgenomicvariationinnoncodingelementsusingpopulationscalesequencingdatafromthe1000genomesproject
AT luzhijohn analysisofgenomicvariationinnoncodingelementsusingpopulationscalesequencingdatafromthe1000genomesproject
AT kongyong analysisofgenomicvariationinnoncodingelementsusingpopulationscalesequencingdatafromthe1000genomesproject
AT lamhugoyk analysisofgenomicvariationinnoncodingelementsusingpopulationscalesequencingdatafromthe1000genomesproject
AT gersteinmarkb analysisofgenomicvariationinnoncodingelementsusingpopulationscalesequencingdatafromthe1000genomesproject