Cargando…
Phasing for medical sequencing using rare variants and large haplotype reference panels
Motivation: There is growing recognition that estimating haplotypes from high coverage sequencing of single samples in clinical settings is an important problem. At the same time very large datasets consisting of tens and hundreds of thousands of high-coverage sequenced samples will soon be availabl...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4920110/ https://www.ncbi.nlm.nih.gov/pubmed/27153703 http://dx.doi.org/10.1093/bioinformatics/btw065 |
_version_ | 1782439351900176384 |
---|---|
author | Sharp, Kevin Kretzschmar, Warren Delaneau, Olivier Marchini, Jonathan |
author_facet | Sharp, Kevin Kretzschmar, Warren Delaneau, Olivier Marchini, Jonathan |
author_sort | Sharp, Kevin |
collection | PubMed |
description | Motivation: There is growing recognition that estimating haplotypes from high coverage sequencing of single samples in clinical settings is an important problem. At the same time very large datasets consisting of tens and hundreds of thousands of high-coverage sequenced samples will soon be available. We describe a method that takes advantage of these huge human genetic variation resources and rare variant sharing patterns to estimate haplotypes on single sequenced samples. Sharing rare variants between two individuals is more likely to arise from a recent common ancestor and, hence, also more likely to indicate similar shared haplotypes over a substantial flanking region of sequence. Results: Our method exploits this idea to select a small set of highly informative copying states within a Hidden Markov Model (HMM) phasing algorithm. Using rare variants in this way allows us to avoid iterative MCMC methods to infer haplotypes. Compared to other approaches that do not explicitly use rare variants we obtain significant gains in phasing accuracy, less variation over phasing runs and improvements in speed. For example, using a reference panel of 7420 haplotypes from the UK10K project, we are able to reduce switch error rates by up to 50% when phasing samples sequenced at high-coverage. In addition, a single step rephasing of the UK10K panel, using rare variant information, has a downstream impact on phasing performance. These results represent a proof of concept that rare variant sharing patterns can be utilized to phase large high-coverage sequencing studies such as the 100 000 Genomes Project dataset. Availability and implementation: A webserver that includes an implementation of this new method and allows phasing of high-coverage clinical samples is available at https://phasingserver.stats.ox.ac.uk/. Contact: marchini@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-4920110 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-49201102016-06-27 Phasing for medical sequencing using rare variants and large haplotype reference panels Sharp, Kevin Kretzschmar, Warren Delaneau, Olivier Marchini, Jonathan Bioinformatics Original Papers Motivation: There is growing recognition that estimating haplotypes from high coverage sequencing of single samples in clinical settings is an important problem. At the same time very large datasets consisting of tens and hundreds of thousands of high-coverage sequenced samples will soon be available. We describe a method that takes advantage of these huge human genetic variation resources and rare variant sharing patterns to estimate haplotypes on single sequenced samples. Sharing rare variants between two individuals is more likely to arise from a recent common ancestor and, hence, also more likely to indicate similar shared haplotypes over a substantial flanking region of sequence. Results: Our method exploits this idea to select a small set of highly informative copying states within a Hidden Markov Model (HMM) phasing algorithm. Using rare variants in this way allows us to avoid iterative MCMC methods to infer haplotypes. Compared to other approaches that do not explicitly use rare variants we obtain significant gains in phasing accuracy, less variation over phasing runs and improvements in speed. For example, using a reference panel of 7420 haplotypes from the UK10K project, we are able to reduce switch error rates by up to 50% when phasing samples sequenced at high-coverage. In addition, a single step rephasing of the UK10K panel, using rare variant information, has a downstream impact on phasing performance. These results represent a proof of concept that rare variant sharing patterns can be utilized to phase large high-coverage sequencing studies such as the 100 000 Genomes Project dataset. Availability and implementation: A webserver that includes an implementation of this new method and allows phasing of high-coverage clinical samples is available at https://phasingserver.stats.ox.ac.uk/. Contact: marchini@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-07-01 2016-02-27 /pmc/articles/PMC4920110/ /pubmed/27153703 http://dx.doi.org/10.1093/bioinformatics/btw065 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Papers Sharp, Kevin Kretzschmar, Warren Delaneau, Olivier Marchini, Jonathan Phasing for medical sequencing using rare variants and large haplotype reference panels |
title | Phasing for medical sequencing using rare variants and large haplotype reference panels |
title_full | Phasing for medical sequencing using rare variants and large haplotype reference panels |
title_fullStr | Phasing for medical sequencing using rare variants and large haplotype reference panels |
title_full_unstemmed | Phasing for medical sequencing using rare variants and large haplotype reference panels |
title_short | Phasing for medical sequencing using rare variants and large haplotype reference panels |
title_sort | phasing for medical sequencing using rare variants and large haplotype reference panels |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4920110/ https://www.ncbi.nlm.nih.gov/pubmed/27153703 http://dx.doi.org/10.1093/bioinformatics/btw065 |
work_keys_str_mv | AT sharpkevin phasingformedicalsequencingusingrarevariantsandlargehaplotypereferencepanels AT kretzschmarwarren phasingformedicalsequencingusingrarevariantsandlargehaplotypereferencepanels AT delaneauolivier phasingformedicalsequencingusingrarevariantsandlargehaplotypereferencepanels AT marchinijonathan phasingformedicalsequencingusingrarevariantsandlargehaplotypereferencepanels |