Cargando…

Discovery and genotyping of structural variation from long-read haploid genome sequence data

In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural v...

Descripción completa

Detalles Bibliográficos
Autores principales: Huddleston, John, Chaisson, Mark J.P., Steinberg, Karyn Meltz, Warren, Wes, Hoekzema, Kendra, Gordon, David, Graves-Lindsay, Tina A., Munson, Katherine M., Kronenberg, Zev N., Vives, Laura, Peluso, Paul, Boitano, Matthew, Chin, Chen-Shin, Korlach, Jonas, Wilson, Richard K., Eichler, Evan E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5411763/
https://www.ncbi.nlm.nih.gov/pubmed/27895111
http://dx.doi.org/10.1101/gr.214007.116
_version_ 1783232860798320640
author Huddleston, John
Chaisson, Mark J.P.
Steinberg, Karyn Meltz
Warren, Wes
Hoekzema, Kendra
Gordon, David
Graves-Lindsay, Tina A.
Munson, Katherine M.
Kronenberg, Zev N.
Vives, Laura
Peluso, Paul
Boitano, Matthew
Chin, Chen-Shin
Korlach, Jonas
Wilson, Richard K.
Eichler, Evan E.
author_facet Huddleston, John
Chaisson, Mark J.P.
Steinberg, Karyn Meltz
Warren, Wes
Hoekzema, Kendra
Gordon, David
Graves-Lindsay, Tina A.
Munson, Katherine M.
Kronenberg, Zev N.
Vives, Laura
Peluso, Paul
Boitano, Matthew
Chin, Chen-Shin
Korlach, Jonas
Wilson, Richard K.
Eichler, Evan E.
author_sort Huddleston, John
collection PubMed
description In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF > 1%). We estimate that this theoretical human diploid differs by as much as ∼16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery from genotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that ∼59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.
format Online
Article
Text
id pubmed-5411763
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-54117632017-11-01 Discovery and genotyping of structural variation from long-read haploid genome sequence data Huddleston, John Chaisson, Mark J.P. Steinberg, Karyn Meltz Warren, Wes Hoekzema, Kendra Gordon, David Graves-Lindsay, Tina A. Munson, Katherine M. Kronenberg, Zev N. Vives, Laura Peluso, Paul Boitano, Matthew Chin, Chen-Shin Korlach, Jonas Wilson, Richard K. Eichler, Evan E. Genome Res Research In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF > 1%). We estimate that this theoretical human diploid differs by as much as ∼16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery from genotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that ∼59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection. Cold Spring Harbor Laboratory Press 2017-05 /pmc/articles/PMC5411763/ /pubmed/27895111 http://dx.doi.org/10.1101/gr.214007.116 Text en © 2017 Huddleston et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Research
Huddleston, John
Chaisson, Mark J.P.
Steinberg, Karyn Meltz
Warren, Wes
Hoekzema, Kendra
Gordon, David
Graves-Lindsay, Tina A.
Munson, Katherine M.
Kronenberg, Zev N.
Vives, Laura
Peluso, Paul
Boitano, Matthew
Chin, Chen-Shin
Korlach, Jonas
Wilson, Richard K.
Eichler, Evan E.
Discovery and genotyping of structural variation from long-read haploid genome sequence data
title Discovery and genotyping of structural variation from long-read haploid genome sequence data
title_full Discovery and genotyping of structural variation from long-read haploid genome sequence data
title_fullStr Discovery and genotyping of structural variation from long-read haploid genome sequence data
title_full_unstemmed Discovery and genotyping of structural variation from long-read haploid genome sequence data
title_short Discovery and genotyping of structural variation from long-read haploid genome sequence data
title_sort discovery and genotyping of structural variation from long-read haploid genome sequence data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5411763/
https://www.ncbi.nlm.nih.gov/pubmed/27895111
http://dx.doi.org/10.1101/gr.214007.116
work_keys_str_mv AT huddlestonjohn discoveryandgenotypingofstructuralvariationfromlongreadhaploidgenomesequencedata
AT chaissonmarkjp discoveryandgenotypingofstructuralvariationfromlongreadhaploidgenomesequencedata
AT steinbergkarynmeltz discoveryandgenotypingofstructuralvariationfromlongreadhaploidgenomesequencedata
AT warrenwes discoveryandgenotypingofstructuralvariationfromlongreadhaploidgenomesequencedata
AT hoekzemakendra discoveryandgenotypingofstructuralvariationfromlongreadhaploidgenomesequencedata
AT gordondavid discoveryandgenotypingofstructuralvariationfromlongreadhaploidgenomesequencedata
AT graveslindsaytinaa discoveryandgenotypingofstructuralvariationfromlongreadhaploidgenomesequencedata
AT munsonkatherinem discoveryandgenotypingofstructuralvariationfromlongreadhaploidgenomesequencedata
AT kronenbergzevn discoveryandgenotypingofstructuralvariationfromlongreadhaploidgenomesequencedata
AT viveslaura discoveryandgenotypingofstructuralvariationfromlongreadhaploidgenomesequencedata
AT pelusopaul discoveryandgenotypingofstructuralvariationfromlongreadhaploidgenomesequencedata
AT boitanomatthew discoveryandgenotypingofstructuralvariationfromlongreadhaploidgenomesequencedata
AT chinchenshin discoveryandgenotypingofstructuralvariationfromlongreadhaploidgenomesequencedata
AT korlachjonas discoveryandgenotypingofstructuralvariationfromlongreadhaploidgenomesequencedata
AT wilsonrichardk discoveryandgenotypingofstructuralvariationfromlongreadhaploidgenomesequencedata
AT eichlerevane discoveryandgenotypingofstructuralvariationfromlongreadhaploidgenomesequencedata