Cargando…
An expanded sequence context model broadly explains variability in polymorphism levels across the human genome
The rate of single nucleotide polymorphism varies substantially across the human genome and fundamentally influences evolution and incidence of genetic disease. Previous studies have only considered the immediate flanking nucleotides around a polymorphic site –the site’s trinucleotide sequence conte...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4811712/ https://www.ncbi.nlm.nih.gov/pubmed/26878723 http://dx.doi.org/10.1038/ng.3511 |
_version_ | 1782424010962763776 |
---|---|
author | Aggarwala, Varun Voight, Benjamin F. |
author_facet | Aggarwala, Varun Voight, Benjamin F. |
author_sort | Aggarwala, Varun |
collection | PubMed |
description | The rate of single nucleotide polymorphism varies substantially across the human genome and fundamentally influences evolution and incidence of genetic disease. Previous studies have only considered the immediate flanking nucleotides around a polymorphic site –the site’s trinucleotide sequence context– to study polymorph levels across the genome. Moreover, the impact of larger sequence contexts has not been fully clarified, even though context substantially influences rates of polymorphism. Using a new statistical framework and data from the 1000 Genomes Project, we demonstrate that a heptanucleotide context explains >81% of variability in substitution probabilities, revealing new mutation-promoting motifs at ApT dinucleotide, CAAT, and TACG sequences. Our approach also identifies previously undocumented variability in C-to-T substitutions at CpG sites, which is not immediately explained by differential methylation intensity. Using our model, we present informative substitution intolerance scores for genes and a new intolerance score for amino acids, and we demonstrate clinical use of the model in neuropsychiatric diseases. |
format | Online Article Text |
id | pubmed-4811712 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
record_format | MEDLINE/PubMed |
spelling | pubmed-48117122016-08-15 An expanded sequence context model broadly explains variability in polymorphism levels across the human genome Aggarwala, Varun Voight, Benjamin F. Nat Genet Article The rate of single nucleotide polymorphism varies substantially across the human genome and fundamentally influences evolution and incidence of genetic disease. Previous studies have only considered the immediate flanking nucleotides around a polymorphic site –the site’s trinucleotide sequence context– to study polymorph levels across the genome. Moreover, the impact of larger sequence contexts has not been fully clarified, even though context substantially influences rates of polymorphism. Using a new statistical framework and data from the 1000 Genomes Project, we demonstrate that a heptanucleotide context explains >81% of variability in substitution probabilities, revealing new mutation-promoting motifs at ApT dinucleotide, CAAT, and TACG sequences. Our approach also identifies previously undocumented variability in C-to-T substitutions at CpG sites, which is not immediately explained by differential methylation intensity. Using our model, we present informative substitution intolerance scores for genes and a new intolerance score for amino acids, and we demonstrate clinical use of the model in neuropsychiatric diseases. 2016-02-15 2016-04 /pmc/articles/PMC4811712/ /pubmed/26878723 http://dx.doi.org/10.1038/ng.3511 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms |
spellingShingle | Article Aggarwala, Varun Voight, Benjamin F. An expanded sequence context model broadly explains variability in polymorphism levels across the human genome |
title | An expanded sequence context model broadly explains variability in polymorphism levels across the human genome |
title_full | An expanded sequence context model broadly explains variability in polymorphism levels across the human genome |
title_fullStr | An expanded sequence context model broadly explains variability in polymorphism levels across the human genome |
title_full_unstemmed | An expanded sequence context model broadly explains variability in polymorphism levels across the human genome |
title_short | An expanded sequence context model broadly explains variability in polymorphism levels across the human genome |
title_sort | expanded sequence context model broadly explains variability in polymorphism levels across the human genome |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4811712/ https://www.ncbi.nlm.nih.gov/pubmed/26878723 http://dx.doi.org/10.1038/ng.3511 |
work_keys_str_mv | AT aggarwalavarun anexpandedsequencecontextmodelbroadlyexplainsvariabilityinpolymorphismlevelsacrossthehumangenome AT voightbenjaminf anexpandedsequencecontextmodelbroadlyexplainsvariabilityinpolymorphismlevelsacrossthehumangenome AT aggarwalavarun expandedsequencecontextmodelbroadlyexplainsvariabilityinpolymorphismlevelsacrossthehumangenome AT voightbenjaminf expandedsequencecontextmodelbroadlyexplainsvariabilityinpolymorphismlevelsacrossthehumangenome |