Cargando…

Phasing for medical sequencing using rare variants and large haplotype reference panels

Motivation: There is growing recognition that estimating haplotypes from high coverage sequencing of single samples in clinical settings is an important problem. At the same time very large datasets consisting of tens and hundreds of thousands of high-coverage sequenced samples will soon be availabl...

Descripción completa

Detalles Bibliográficos
Autores principales: Sharp, Kevin, Kretzschmar, Warren, Delaneau, Olivier, Marchini, Jonathan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4920110/
https://www.ncbi.nlm.nih.gov/pubmed/27153703
http://dx.doi.org/10.1093/bioinformatics/btw065
_version_ 1782439351900176384
author Sharp, Kevin
Kretzschmar, Warren
Delaneau, Olivier
Marchini, Jonathan
author_facet Sharp, Kevin
Kretzschmar, Warren
Delaneau, Olivier
Marchini, Jonathan
author_sort Sharp, Kevin
collection PubMed
description Motivation: There is growing recognition that estimating haplotypes from high coverage sequencing of single samples in clinical settings is an important problem. At the same time very large datasets consisting of tens and hundreds of thousands of high-coverage sequenced samples will soon be available. We describe a method that takes advantage of these huge human genetic variation resources and rare variant sharing patterns to estimate haplotypes on single sequenced samples. Sharing rare variants between two individuals is more likely to arise from a recent common ancestor and, hence, also more likely to indicate similar shared haplotypes over a substantial flanking region of sequence. Results: Our method exploits this idea to select a small set of highly informative copying states within a Hidden Markov Model (HMM) phasing algorithm. Using rare variants in this way allows us to avoid iterative MCMC methods to infer haplotypes. Compared to other approaches that do not explicitly use rare variants we obtain significant gains in phasing accuracy, less variation over phasing runs and improvements in speed. For example, using a reference panel of 7420 haplotypes from the UK10K project, we are able to reduce switch error rates by up to 50% when phasing samples sequenced at high-coverage. In addition, a single step rephasing of the UK10K panel, using rare variant information, has a downstream impact on phasing performance. These results represent a proof of concept that rare variant sharing patterns can be utilized to phase large high-coverage sequencing studies such as the 100 000 Genomes Project dataset. Availability and implementation: A webserver that includes an implementation of this new method and allows phasing of high-coverage clinical samples is available at https://phasingserver.stats.ox.ac.uk/. Contact: marchini@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4920110
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-49201102016-06-27 Phasing for medical sequencing using rare variants and large haplotype reference panels Sharp, Kevin Kretzschmar, Warren Delaneau, Olivier Marchini, Jonathan Bioinformatics Original Papers Motivation: There is growing recognition that estimating haplotypes from high coverage sequencing of single samples in clinical settings is an important problem. At the same time very large datasets consisting of tens and hundreds of thousands of high-coverage sequenced samples will soon be available. We describe a method that takes advantage of these huge human genetic variation resources and rare variant sharing patterns to estimate haplotypes on single sequenced samples. Sharing rare variants between two individuals is more likely to arise from a recent common ancestor and, hence, also more likely to indicate similar shared haplotypes over a substantial flanking region of sequence. Results: Our method exploits this idea to select a small set of highly informative copying states within a Hidden Markov Model (HMM) phasing algorithm. Using rare variants in this way allows us to avoid iterative MCMC methods to infer haplotypes. Compared to other approaches that do not explicitly use rare variants we obtain significant gains in phasing accuracy, less variation over phasing runs and improvements in speed. For example, using a reference panel of 7420 haplotypes from the UK10K project, we are able to reduce switch error rates by up to 50% when phasing samples sequenced at high-coverage. In addition, a single step rephasing of the UK10K panel, using rare variant information, has a downstream impact on phasing performance. These results represent a proof of concept that rare variant sharing patterns can be utilized to phase large high-coverage sequencing studies such as the 100 000 Genomes Project dataset. Availability and implementation: A webserver that includes an implementation of this new method and allows phasing of high-coverage clinical samples is available at https://phasingserver.stats.ox.ac.uk/. Contact: marchini@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-07-01 2016-02-27 /pmc/articles/PMC4920110/ /pubmed/27153703 http://dx.doi.org/10.1093/bioinformatics/btw065 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Sharp, Kevin
Kretzschmar, Warren
Delaneau, Olivier
Marchini, Jonathan
Phasing for medical sequencing using rare variants and large haplotype reference panels
title Phasing for medical sequencing using rare variants and large haplotype reference panels
title_full Phasing for medical sequencing using rare variants and large haplotype reference panels
title_fullStr Phasing for medical sequencing using rare variants and large haplotype reference panels
title_full_unstemmed Phasing for medical sequencing using rare variants and large haplotype reference panels
title_short Phasing for medical sequencing using rare variants and large haplotype reference panels
title_sort phasing for medical sequencing using rare variants and large haplotype reference panels
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4920110/
https://www.ncbi.nlm.nih.gov/pubmed/27153703
http://dx.doi.org/10.1093/bioinformatics/btw065
work_keys_str_mv AT sharpkevin phasingformedicalsequencingusingrarevariantsandlargehaplotypereferencepanels
AT kretzschmarwarren phasingformedicalsequencingusingrarevariantsandlargehaplotypereferencepanels
AT delaneauolivier phasingformedicalsequencingusingrarevariantsandlargehaplotypereferencepanels
AT marchinijonathan phasingformedicalsequencingusingrarevariantsandlargehaplotypereferencepanels