Cargando…

The theory of discovering rare variants via DNA sequencing

BACKGROUND: Rare population variants are known to have important biomedical implications, but their systematic discovery has only recently been enabled by advances in DNA sequencing. The design process of a discovery project remains formidable, being limited to ad hoc mixtures of extensive computer...

Descripción completa

Detalles Bibliográficos
Autores principales: Wendl, Michael C, Wilson, Richard K
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2778663/
https://www.ncbi.nlm.nih.gov/pubmed/19843339
http://dx.doi.org/10.1186/1471-2164-10-485
_version_ 1782174284387450880
author Wendl, Michael C
Wilson, Richard K
author_facet Wendl, Michael C
Wilson, Richard K
author_sort Wendl, Michael C
collection PubMed
description BACKGROUND: Rare population variants are known to have important biomedical implications, but their systematic discovery has only recently been enabled by advances in DNA sequencing. The design process of a discovery project remains formidable, being limited to ad hoc mixtures of extensive computer simulation and pilot sequencing. Here, the task is examined from a general mathematical perspective. RESULTS: We pose and solve the population sequencing design problem and subsequently apply standard optimization techniques that maximize the discovery probability. Emphasis is placed on cases whose discovery thresholds place them within reach of current technologies. We find that parameter values characteristic of rare-variant projects lead to a general, yet remarkably simple set of optimization rules. Specifically, optimal processing occurs at constant values of the per-sample redundancy, refuting current notions that sample size should be selected outright. Optimal project-wide redundancy and sample size are then shown to be inversely proportional to the desired variant frequency. A second family of constants governs these relationships, permitting one to immediately establish the most efficient settings for a given set of discovery conditions. Our results largely concur with the empirical design of the Thousand Genomes Project, though they furnish some additional refinement. CONCLUSION: The optimization principles reported here dramatically simplify the design process and should be broadly useful as rare-variant projects become both more important and routine in the future.
format Text
id pubmed-2778663
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27786632009-11-18 The theory of discovering rare variants via DNA sequencing Wendl, Michael C Wilson, Richard K BMC Genomics Research article BACKGROUND: Rare population variants are known to have important biomedical implications, but their systematic discovery has only recently been enabled by advances in DNA sequencing. The design process of a discovery project remains formidable, being limited to ad hoc mixtures of extensive computer simulation and pilot sequencing. Here, the task is examined from a general mathematical perspective. RESULTS: We pose and solve the population sequencing design problem and subsequently apply standard optimization techniques that maximize the discovery probability. Emphasis is placed on cases whose discovery thresholds place them within reach of current technologies. We find that parameter values characteristic of rare-variant projects lead to a general, yet remarkably simple set of optimization rules. Specifically, optimal processing occurs at constant values of the per-sample redundancy, refuting current notions that sample size should be selected outright. Optimal project-wide redundancy and sample size are then shown to be inversely proportional to the desired variant frequency. A second family of constants governs these relationships, permitting one to immediately establish the most efficient settings for a given set of discovery conditions. Our results largely concur with the empirical design of the Thousand Genomes Project, though they furnish some additional refinement. CONCLUSION: The optimization principles reported here dramatically simplify the design process and should be broadly useful as rare-variant projects become both more important and routine in the future. BioMed Central 2009-10-20 /pmc/articles/PMC2778663/ /pubmed/19843339 http://dx.doi.org/10.1186/1471-2164-10-485 Text en Copyright ©2009 Wendl and Wilson; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
Wendl, Michael C
Wilson, Richard K
The theory of discovering rare variants via DNA sequencing
title The theory of discovering rare variants via DNA sequencing
title_full The theory of discovering rare variants via DNA sequencing
title_fullStr The theory of discovering rare variants via DNA sequencing
title_full_unstemmed The theory of discovering rare variants via DNA sequencing
title_short The theory of discovering rare variants via DNA sequencing
title_sort theory of discovering rare variants via dna sequencing
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2778663/
https://www.ncbi.nlm.nih.gov/pubmed/19843339
http://dx.doi.org/10.1186/1471-2164-10-485
work_keys_str_mv AT wendlmichaelc thetheoryofdiscoveringrarevariantsviadnasequencing
AT wilsonrichardk thetheoryofdiscoveringrarevariantsviadnasequencing
AT wendlmichaelc theoryofdiscoveringrarevariantsviadnasequencing
AT wilsonrichardk theoryofdiscoveringrarevariantsviadnasequencing