Cargando…

An expanded sequence context model broadly explains variability in polymorphism levels across the human genome

The rate of single nucleotide polymorphism varies substantially across the human genome and fundamentally influences evolution and incidence of genetic disease. Previous studies have only considered the immediate flanking nucleotides around a polymorphic site –the site’s trinucleotide sequence conte...

Descripción completa

Detalles Bibliográficos
Autores principales: Aggarwala, Varun, Voight, Benjamin F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4811712/
https://www.ncbi.nlm.nih.gov/pubmed/26878723
http://dx.doi.org/10.1038/ng.3511
_version_ 1782424010962763776
author Aggarwala, Varun
Voight, Benjamin F.
author_facet Aggarwala, Varun
Voight, Benjamin F.
author_sort Aggarwala, Varun
collection PubMed
description The rate of single nucleotide polymorphism varies substantially across the human genome and fundamentally influences evolution and incidence of genetic disease. Previous studies have only considered the immediate flanking nucleotides around a polymorphic site –the site’s trinucleotide sequence context– to study polymorph levels across the genome. Moreover, the impact of larger sequence contexts has not been fully clarified, even though context substantially influences rates of polymorphism. Using a new statistical framework and data from the 1000 Genomes Project, we demonstrate that a heptanucleotide context explains >81% of variability in substitution probabilities, revealing new mutation-promoting motifs at ApT dinucleotide, CAAT, and TACG sequences. Our approach also identifies previously undocumented variability in C-to-T substitutions at CpG sites, which is not immediately explained by differential methylation intensity. Using our model, we present informative substitution intolerance scores for genes and a new intolerance score for amino acids, and we demonstrate clinical use of the model in neuropsychiatric diseases.
format Online
Article
Text
id pubmed-4811712
institution National Center for Biotechnology Information
language English
publishDate 2016
record_format MEDLINE/PubMed
spelling pubmed-48117122016-08-15 An expanded sequence context model broadly explains variability in polymorphism levels across the human genome Aggarwala, Varun Voight, Benjamin F. Nat Genet Article The rate of single nucleotide polymorphism varies substantially across the human genome and fundamentally influences evolution and incidence of genetic disease. Previous studies have only considered the immediate flanking nucleotides around a polymorphic site –the site’s trinucleotide sequence context– to study polymorph levels across the genome. Moreover, the impact of larger sequence contexts has not been fully clarified, even though context substantially influences rates of polymorphism. Using a new statistical framework and data from the 1000 Genomes Project, we demonstrate that a heptanucleotide context explains >81% of variability in substitution probabilities, revealing new mutation-promoting motifs at ApT dinucleotide, CAAT, and TACG sequences. Our approach also identifies previously undocumented variability in C-to-T substitutions at CpG sites, which is not immediately explained by differential methylation intensity. Using our model, we present informative substitution intolerance scores for genes and a new intolerance score for amino acids, and we demonstrate clinical use of the model in neuropsychiatric diseases. 2016-02-15 2016-04 /pmc/articles/PMC4811712/ /pubmed/26878723 http://dx.doi.org/10.1038/ng.3511 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
Aggarwala, Varun
Voight, Benjamin F.
An expanded sequence context model broadly explains variability in polymorphism levels across the human genome
title An expanded sequence context model broadly explains variability in polymorphism levels across the human genome
title_full An expanded sequence context model broadly explains variability in polymorphism levels across the human genome
title_fullStr An expanded sequence context model broadly explains variability in polymorphism levels across the human genome
title_full_unstemmed An expanded sequence context model broadly explains variability in polymorphism levels across the human genome
title_short An expanded sequence context model broadly explains variability in polymorphism levels across the human genome
title_sort expanded sequence context model broadly explains variability in polymorphism levels across the human genome
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4811712/
https://www.ncbi.nlm.nih.gov/pubmed/26878723
http://dx.doi.org/10.1038/ng.3511
work_keys_str_mv AT aggarwalavarun anexpandedsequencecontextmodelbroadlyexplainsvariabilityinpolymorphismlevelsacrossthehumangenome
AT voightbenjaminf anexpandedsequencecontextmodelbroadlyexplainsvariabilityinpolymorphismlevelsacrossthehumangenome
AT aggarwalavarun expandedsequencecontextmodelbroadlyexplainsvariabilityinpolymorphismlevelsacrossthehumangenome
AT voightbenjaminf expandedsequencecontextmodelbroadlyexplainsvariabilityinpolymorphismlevelsacrossthehumangenome