Cargando…

A flexible method for estimating the fraction of fitness influencing mutations from large sequencing data sets

A continuing challenge in the analysis of massively large sequencing data sets is quantifying and interpreting non-neutrally evolving mutations. Here, we describe a flexible and robust approach based on the site frequency spectrum to estimate the fraction of deleterious and adaptive variants from la...

Descripción completa

Detalles Bibliográficos
Autores principales: Moon, Sunjin, Akey, Joshua M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4889975/
https://www.ncbi.nlm.nih.gov/pubmed/27197222
http://dx.doi.org/10.1101/gr.203059.115
_version_ 1782435046361137152
author Moon, Sunjin
Akey, Joshua M.
author_facet Moon, Sunjin
Akey, Joshua M.
author_sort Moon, Sunjin
collection PubMed
description A continuing challenge in the analysis of massively large sequencing data sets is quantifying and interpreting non-neutrally evolving mutations. Here, we describe a flexible and robust approach based on the site frequency spectrum to estimate the fraction of deleterious and adaptive variants from large-scale sequencing data sets. We applied our method to approximately 1 million single nucleotide variants (SNVs) identified in high-coverage exome sequences of 6515 individuals. We estimate that the fraction of deleterious nonsynonymous SNVs is higher than previously reported; quantify the effects of genomic context, codon bias, chromatin accessibility, and number of protein–protein interactions on deleterious protein-coding SNVs; and identify pathways and networks that have likely been influenced by positive selection. Furthermore, we show that the fraction of deleterious nonsynonymous SNVs is significantly higher for Mendelian versus complex disease loci and in exons harboring dominant versus recessive Mendelian mutations. In summary, as genome-scale sequencing data accumulate in progressively larger sample sizes, our method will enable increasingly high-resolution inferences into the characteristics and determinants of non-neutral variation.
format Online
Article
Text
id pubmed-4889975
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-48899752016-12-01 A flexible method for estimating the fraction of fitness influencing mutations from large sequencing data sets Moon, Sunjin Akey, Joshua M. Genome Res Method A continuing challenge in the analysis of massively large sequencing data sets is quantifying and interpreting non-neutrally evolving mutations. Here, we describe a flexible and robust approach based on the site frequency spectrum to estimate the fraction of deleterious and adaptive variants from large-scale sequencing data sets. We applied our method to approximately 1 million single nucleotide variants (SNVs) identified in high-coverage exome sequences of 6515 individuals. We estimate that the fraction of deleterious nonsynonymous SNVs is higher than previously reported; quantify the effects of genomic context, codon bias, chromatin accessibility, and number of protein–protein interactions on deleterious protein-coding SNVs; and identify pathways and networks that have likely been influenced by positive selection. Furthermore, we show that the fraction of deleterious nonsynonymous SNVs is significantly higher for Mendelian versus complex disease loci and in exons harboring dominant versus recessive Mendelian mutations. In summary, as genome-scale sequencing data accumulate in progressively larger sample sizes, our method will enable increasingly high-resolution inferences into the characteristics and determinants of non-neutral variation. Cold Spring Harbor Laboratory Press 2016-06 /pmc/articles/PMC4889975/ /pubmed/27197222 http://dx.doi.org/10.1101/gr.203059.115 Text en © 2016 Moon and Akey; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Method
Moon, Sunjin
Akey, Joshua M.
A flexible method for estimating the fraction of fitness influencing mutations from large sequencing data sets
title A flexible method for estimating the fraction of fitness influencing mutations from large sequencing data sets
title_full A flexible method for estimating the fraction of fitness influencing mutations from large sequencing data sets
title_fullStr A flexible method for estimating the fraction of fitness influencing mutations from large sequencing data sets
title_full_unstemmed A flexible method for estimating the fraction of fitness influencing mutations from large sequencing data sets
title_short A flexible method for estimating the fraction of fitness influencing mutations from large sequencing data sets
title_sort flexible method for estimating the fraction of fitness influencing mutations from large sequencing data sets
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4889975/
https://www.ncbi.nlm.nih.gov/pubmed/27197222
http://dx.doi.org/10.1101/gr.203059.115
work_keys_str_mv AT moonsunjin aflexiblemethodforestimatingthefractionoffitnessinfluencingmutationsfromlargesequencingdatasets
AT akeyjoshuam aflexiblemethodforestimatingthefractionoffitnessinfluencingmutationsfromlargesequencingdatasets
AT moonsunjin flexiblemethodforestimatingthefractionoffitnessinfluencingmutationsfromlargesequencingdatasets
AT akeyjoshuam flexiblemethodforestimatingthefractionoffitnessinfluencingmutationsfromlargesequencingdatasets