Cargando…

Optimal sequencing strategies for identifying disease-associated singletons

With the increasing focus of genetic association on the identification of trait-associated rare variants through sequencing, it is important to identify the most cost-effective sequencing strategies for these studies. Deep sequencing will accurately detect and genotype the most rare variants per ind...

Descripción completa

Detalles Bibliográficos
Autores principales: Rashkin, Sara, Jun, Goo, Chen, Sai, Abecasis, Goncalo R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5501675/
https://www.ncbi.nlm.nih.gov/pubmed/28640830
http://dx.doi.org/10.1371/journal.pgen.1006811
_version_ 1783248834685566976
author Rashkin, Sara
Jun, Goo
Chen, Sai
Abecasis, Goncalo R.
author_facet Rashkin, Sara
Jun, Goo
Chen, Sai
Abecasis, Goncalo R.
author_sort Rashkin, Sara
collection PubMed
description With the increasing focus of genetic association on the identification of trait-associated rare variants through sequencing, it is important to identify the most cost-effective sequencing strategies for these studies. Deep sequencing will accurately detect and genotype the most rare variants per individual, but may limit sample size. Low pass sequencing will miss some variants in each individual but has been shown to provide a cost-effective alternative for studies of common variants. Here, we investigate the impact of sequencing depth on studies of rare variants, focusing on singletons—the variants that are sampled in a single individual and are hardest to detect at low sequencing depths. We first estimate the sensitivity to detect singleton variants in both simulated data and in down-sampled deep genome and exome sequence data. We then explore the power of association studies comparing burden of singleton variants in cases and controls under a variety of conditions. We show that the power to detect singletons increases with coverage, typically plateauing for coverage > ~25x. Next, we show that, when total sequencing capacity is fixed, the power of association studies focused on singletons is typically maximized for coverage of 15-20x, independent of relative risk, disease prevalence, singleton burden, and case-control ratio. Our results suggest sequencing depth of 15-20x as an appropriate compromise of singleton detection power and sample size for studies of rare variants in complex disease.
format Online
Article
Text
id pubmed-5501675
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-55016752017-07-25 Optimal sequencing strategies for identifying disease-associated singletons Rashkin, Sara Jun, Goo Chen, Sai Abecasis, Goncalo R. PLoS Genet Research Article With the increasing focus of genetic association on the identification of trait-associated rare variants through sequencing, it is important to identify the most cost-effective sequencing strategies for these studies. Deep sequencing will accurately detect and genotype the most rare variants per individual, but may limit sample size. Low pass sequencing will miss some variants in each individual but has been shown to provide a cost-effective alternative for studies of common variants. Here, we investigate the impact of sequencing depth on studies of rare variants, focusing on singletons—the variants that are sampled in a single individual and are hardest to detect at low sequencing depths. We first estimate the sensitivity to detect singleton variants in both simulated data and in down-sampled deep genome and exome sequence data. We then explore the power of association studies comparing burden of singleton variants in cases and controls under a variety of conditions. We show that the power to detect singletons increases with coverage, typically plateauing for coverage > ~25x. Next, we show that, when total sequencing capacity is fixed, the power of association studies focused on singletons is typically maximized for coverage of 15-20x, independent of relative risk, disease prevalence, singleton burden, and case-control ratio. Our results suggest sequencing depth of 15-20x as an appropriate compromise of singleton detection power and sample size for studies of rare variants in complex disease. Public Library of Science 2017-06-22 /pmc/articles/PMC5501675/ /pubmed/28640830 http://dx.doi.org/10.1371/journal.pgen.1006811 Text en © 2017 Rashkin et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Rashkin, Sara
Jun, Goo
Chen, Sai
Abecasis, Goncalo R.
Optimal sequencing strategies for identifying disease-associated singletons
title Optimal sequencing strategies for identifying disease-associated singletons
title_full Optimal sequencing strategies for identifying disease-associated singletons
title_fullStr Optimal sequencing strategies for identifying disease-associated singletons
title_full_unstemmed Optimal sequencing strategies for identifying disease-associated singletons
title_short Optimal sequencing strategies for identifying disease-associated singletons
title_sort optimal sequencing strategies for identifying disease-associated singletons
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5501675/
https://www.ncbi.nlm.nih.gov/pubmed/28640830
http://dx.doi.org/10.1371/journal.pgen.1006811
work_keys_str_mv AT rashkinsara optimalsequencingstrategiesforidentifyingdiseaseassociatedsingletons
AT jungoo optimalsequencingstrategiesforidentifyingdiseaseassociatedsingletons
AT chensai optimalsequencingstrategiesforidentifyingdiseaseassociatedsingletons
AT optimalsequencingstrategiesforidentifyingdiseaseassociatedsingletons
AT abecasisgoncalor optimalsequencingstrategiesforidentifyingdiseaseassociatedsingletons