Cargando…
The theory of discovering rare variants via DNA sequencing
BACKGROUND: Rare population variants are known to have important biomedical implications, but their systematic discovery has only recently been enabled by advances in DNA sequencing. The design process of a discovery project remains formidable, being limited to ad hoc mixtures of extensive computer...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2778663/ https://www.ncbi.nlm.nih.gov/pubmed/19843339 http://dx.doi.org/10.1186/1471-2164-10-485 |
_version_ | 1782174284387450880 |
---|---|
author | Wendl, Michael C Wilson, Richard K |
author_facet | Wendl, Michael C Wilson, Richard K |
author_sort | Wendl, Michael C |
collection | PubMed |
description | BACKGROUND: Rare population variants are known to have important biomedical implications, but their systematic discovery has only recently been enabled by advances in DNA sequencing. The design process of a discovery project remains formidable, being limited to ad hoc mixtures of extensive computer simulation and pilot sequencing. Here, the task is examined from a general mathematical perspective. RESULTS: We pose and solve the population sequencing design problem and subsequently apply standard optimization techniques that maximize the discovery probability. Emphasis is placed on cases whose discovery thresholds place them within reach of current technologies. We find that parameter values characteristic of rare-variant projects lead to a general, yet remarkably simple set of optimization rules. Specifically, optimal processing occurs at constant values of the per-sample redundancy, refuting current notions that sample size should be selected outright. Optimal project-wide redundancy and sample size are then shown to be inversely proportional to the desired variant frequency. A second family of constants governs these relationships, permitting one to immediately establish the most efficient settings for a given set of discovery conditions. Our results largely concur with the empirical design of the Thousand Genomes Project, though they furnish some additional refinement. CONCLUSION: The optimization principles reported here dramatically simplify the design process and should be broadly useful as rare-variant projects become both more important and routine in the future. |
format | Text |
id | pubmed-2778663 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-27786632009-11-18 The theory of discovering rare variants via DNA sequencing Wendl, Michael C Wilson, Richard K BMC Genomics Research article BACKGROUND: Rare population variants are known to have important biomedical implications, but their systematic discovery has only recently been enabled by advances in DNA sequencing. The design process of a discovery project remains formidable, being limited to ad hoc mixtures of extensive computer simulation and pilot sequencing. Here, the task is examined from a general mathematical perspective. RESULTS: We pose and solve the population sequencing design problem and subsequently apply standard optimization techniques that maximize the discovery probability. Emphasis is placed on cases whose discovery thresholds place them within reach of current technologies. We find that parameter values characteristic of rare-variant projects lead to a general, yet remarkably simple set of optimization rules. Specifically, optimal processing occurs at constant values of the per-sample redundancy, refuting current notions that sample size should be selected outright. Optimal project-wide redundancy and sample size are then shown to be inversely proportional to the desired variant frequency. A second family of constants governs these relationships, permitting one to immediately establish the most efficient settings for a given set of discovery conditions. Our results largely concur with the empirical design of the Thousand Genomes Project, though they furnish some additional refinement. CONCLUSION: The optimization principles reported here dramatically simplify the design process and should be broadly useful as rare-variant projects become both more important and routine in the future. BioMed Central 2009-10-20 /pmc/articles/PMC2778663/ /pubmed/19843339 http://dx.doi.org/10.1186/1471-2164-10-485 Text en Copyright ©2009 Wendl and Wilson; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research article Wendl, Michael C Wilson, Richard K The theory of discovering rare variants via DNA sequencing |
title | The theory of discovering rare variants via DNA sequencing |
title_full | The theory of discovering rare variants via DNA sequencing |
title_fullStr | The theory of discovering rare variants via DNA sequencing |
title_full_unstemmed | The theory of discovering rare variants via DNA sequencing |
title_short | The theory of discovering rare variants via DNA sequencing |
title_sort | theory of discovering rare variants via dna sequencing |
topic | Research article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2778663/ https://www.ncbi.nlm.nih.gov/pubmed/19843339 http://dx.doi.org/10.1186/1471-2164-10-485 |
work_keys_str_mv | AT wendlmichaelc thetheoryofdiscoveringrarevariantsviadnasequencing AT wilsonrichardk thetheoryofdiscoveringrarevariantsviadnasequencing AT wendlmichaelc theoryofdiscoveringrarevariantsviadnasequencing AT wilsonrichardk theoryofdiscoveringrarevariantsviadnasequencing |