Cargando…

Comparison of phasing strategies for whole human genomes

Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23...

Descripción completa

Detalles Bibliográficos
Autores principales: Choi, Yongwook, Chan, Agnes P., Kirkness, Ewen, Telenti, Amalio, Schork, Nicholas J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5903673/
https://www.ncbi.nlm.nih.gov/pubmed/29621242
http://dx.doi.org/10.1371/journal.pgen.1007308
_version_ 1783314974771249152
author Choi, Yongwook
Chan, Agnes P.
Kirkness, Ewen
Telenti, Amalio
Schork, Nicholas J.
author_facet Choi, Yongwook
Chan, Agnes P.
Kirkness, Ewen
Telenti, Amalio
Schork, Nicholas J.
author_sort Choi, Yongwook
collection PubMed
description Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23 chromosomes individuals possess (i.e., they do not ‘phase’ the genome) often because of the costs and complexities of doing so. We compared 11 different widely-used approaches to phasing human genomes using the publicly available ‘Genome-In-A-Bottle’ (GIAB) phased version of the NA12878 genome as a gold standard. The phasing strategies we compared included laboratory-based assays that prepare DNA in unique ways to facilitate phasing as well as purely computational approaches that seek to reconstruct phase information from general sequencing reads and constructs or population-level haplotype frequency information obtained through a reference panel of haplotypes. To assess the performance of the 11 approaches, we used metrics that included, among others, switch error rates, haplotype block lengths, the proportion of fully phase-resolved genes, phasing accuracy and yield between pairs of SNVs. Our comparisons suggest that a hybrid or combined approach that leverages: 1. population-based phasing using the SHAPEIT software suite, 2. either genome-wide sequencing read data or parental genotypes, and 3. a large reference panel of variant and haplotype frequencies, provides a fast and efficient way to produce highly accurate phase-resolved individual human genomes. We found that for population-based approaches, phasing performance is enhanced with the addition of genome-wide read data; e.g., whole genome shotgun and/or RNA sequencing reads. Further, we found that the inclusion of parental genotype data within a population-based phasing strategy can provide as much as a ten-fold reduction in phasing errors. We also considered a majority voting scheme for the construction of a consensus haplotype combining multiple predictions for enhanced performance and site coverage. Finally, we also identified DNA sequence signatures associated with the genomic regions harboring phasing switch errors, which included regions of low polymorphism or SNV density.
format Online
Article
Text
id pubmed-5903673
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-59036732018-04-27 Comparison of phasing strategies for whole human genomes Choi, Yongwook Chan, Agnes P. Kirkness, Ewen Telenti, Amalio Schork, Nicholas J. PLoS Genet Research Article Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23 chromosomes individuals possess (i.e., they do not ‘phase’ the genome) often because of the costs and complexities of doing so. We compared 11 different widely-used approaches to phasing human genomes using the publicly available ‘Genome-In-A-Bottle’ (GIAB) phased version of the NA12878 genome as a gold standard. The phasing strategies we compared included laboratory-based assays that prepare DNA in unique ways to facilitate phasing as well as purely computational approaches that seek to reconstruct phase information from general sequencing reads and constructs or population-level haplotype frequency information obtained through a reference panel of haplotypes. To assess the performance of the 11 approaches, we used metrics that included, among others, switch error rates, haplotype block lengths, the proportion of fully phase-resolved genes, phasing accuracy and yield between pairs of SNVs. Our comparisons suggest that a hybrid or combined approach that leverages: 1. population-based phasing using the SHAPEIT software suite, 2. either genome-wide sequencing read data or parental genotypes, and 3. a large reference panel of variant and haplotype frequencies, provides a fast and efficient way to produce highly accurate phase-resolved individual human genomes. We found that for population-based approaches, phasing performance is enhanced with the addition of genome-wide read data; e.g., whole genome shotgun and/or RNA sequencing reads. Further, we found that the inclusion of parental genotype data within a population-based phasing strategy can provide as much as a ten-fold reduction in phasing errors. We also considered a majority voting scheme for the construction of a consensus haplotype combining multiple predictions for enhanced performance and site coverage. Finally, we also identified DNA sequence signatures associated with the genomic regions harboring phasing switch errors, which included regions of low polymorphism or SNV density. Public Library of Science 2018-04-05 /pmc/articles/PMC5903673/ /pubmed/29621242 http://dx.doi.org/10.1371/journal.pgen.1007308 Text en © 2018 Choi et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Choi, Yongwook
Chan, Agnes P.
Kirkness, Ewen
Telenti, Amalio
Schork, Nicholas J.
Comparison of phasing strategies for whole human genomes
title Comparison of phasing strategies for whole human genomes
title_full Comparison of phasing strategies for whole human genomes
title_fullStr Comparison of phasing strategies for whole human genomes
title_full_unstemmed Comparison of phasing strategies for whole human genomes
title_short Comparison of phasing strategies for whole human genomes
title_sort comparison of phasing strategies for whole human genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5903673/
https://www.ncbi.nlm.nih.gov/pubmed/29621242
http://dx.doi.org/10.1371/journal.pgen.1007308
work_keys_str_mv AT choiyongwook comparisonofphasingstrategiesforwholehumangenomes
AT chanagnesp comparisonofphasingstrategiesforwholehumangenomes
AT kirknessewen comparisonofphasingstrategiesforwholehumangenomes
AT telentiamalio comparisonofphasingstrategiesforwholehumangenomes
AT schorknicholasj comparisonofphasingstrategiesforwholehumangenomes