Cargando…

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads

Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing(1,2) with...

Descripción completa

Detalles Bibliográficos
Autores principales: Porubsky, David, Ebert, Peter, Audano, Peter A., Vollger, Mitchell R., Harvey, William T., Marijon, Pierre, Ebler, Jana, Munson, Katherine M., Sorensen, Melanie, Sulovari, Arvis, Haukness, Marina, Ghareghani, Maryam, Lansdorp, Peter M., Paten, Benedict, Devine, Scott E., Sanders, Ashley D., Lee, Charles, Chaisson, Mark J. P., Korbel, Jan O., Eichler, Evan E., Marschall, Tobias
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group US 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7954704/
https://www.ncbi.nlm.nih.gov/pubmed/33288906
http://dx.doi.org/10.1038/s41587-020-0719-5
_version_ 1783664130142502912
author Porubsky, David
Ebert, Peter
Audano, Peter A.
Vollger, Mitchell R.
Harvey, William T.
Marijon, Pierre
Ebler, Jana
Munson, Katherine M.
Sorensen, Melanie
Sulovari, Arvis
Haukness, Marina
Ghareghani, Maryam
Lansdorp, Peter M.
Paten, Benedict
Devine, Scott E.
Sanders, Ashley D.
Lee, Charles
Chaisson, Mark J. P.
Korbel, Jan O.
Eichler, Evan E.
Marschall, Tobias
author_facet Porubsky, David
Ebert, Peter
Audano, Peter A.
Vollger, Mitchell R.
Harvey, William T.
Marijon, Pierre
Ebler, Jana
Munson, Katherine M.
Sorensen, Melanie
Sulovari, Arvis
Haukness, Marina
Ghareghani, Maryam
Lansdorp, Peter M.
Paten, Benedict
Devine, Scott E.
Sanders, Ashley D.
Lee, Charles
Chaisson, Mark J. P.
Korbel, Jan O.
Eichler, Evan E.
Marschall, Tobias
author_sort Porubsky, David
collection PubMed
description Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing(1,2) with continuous long-read or high-fidelity(3) sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.
format Online
Article
Text
id pubmed-7954704
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group US
record_format MEDLINE/PubMed
spelling pubmed-79547042021-03-28 Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads Porubsky, David Ebert, Peter Audano, Peter A. Vollger, Mitchell R. Harvey, William T. Marijon, Pierre Ebler, Jana Munson, Katherine M. Sorensen, Melanie Sulovari, Arvis Haukness, Marina Ghareghani, Maryam Lansdorp, Peter M. Paten, Benedict Devine, Scott E. Sanders, Ashley D. Lee, Charles Chaisson, Mark J. P. Korbel, Jan O. Eichler, Evan E. Marschall, Tobias Nat Biotechnol Letter Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing(1,2) with continuous long-read or high-fidelity(3) sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms. Nature Publishing Group US 2020-12-07 2021 /pmc/articles/PMC7954704/ /pubmed/33288906 http://dx.doi.org/10.1038/s41587-020-0719-5 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Letter
Porubsky, David
Ebert, Peter
Audano, Peter A.
Vollger, Mitchell R.
Harvey, William T.
Marijon, Pierre
Ebler, Jana
Munson, Katherine M.
Sorensen, Melanie
Sulovari, Arvis
Haukness, Marina
Ghareghani, Maryam
Lansdorp, Peter M.
Paten, Benedict
Devine, Scott E.
Sanders, Ashley D.
Lee, Charles
Chaisson, Mark J. P.
Korbel, Jan O.
Eichler, Evan E.
Marschall, Tobias
Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads
title Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads
title_full Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads
title_fullStr Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads
title_full_unstemmed Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads
title_short Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads
title_sort fully phased human genome assembly without parental data using single-cell strand sequencing and long reads
topic Letter
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7954704/
https://www.ncbi.nlm.nih.gov/pubmed/33288906
http://dx.doi.org/10.1038/s41587-020-0719-5
work_keys_str_mv AT porubskydavid fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT ebertpeter fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT audanopetera fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT vollgermitchellr fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT harveywilliamt fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT marijonpierre fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT eblerjana fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT munsonkatherinem fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT sorensenmelanie fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT sulovariarvis fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT hauknessmarina fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT ghareghanimaryam fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT lansdorppeterm fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT patenbenedict fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT devinescotte fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT sandersashleyd fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT leecharles fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT chaissonmarkjp fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT korbeljano fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT eichlerevane fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads
AT marschalltobias fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads