Cargando…
Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads
Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing(1,2) with...
Autores principales: | , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group US
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7954704/ https://www.ncbi.nlm.nih.gov/pubmed/33288906 http://dx.doi.org/10.1038/s41587-020-0719-5 |
_version_ | 1783664130142502912 |
---|---|
author | Porubsky, David Ebert, Peter Audano, Peter A. Vollger, Mitchell R. Harvey, William T. Marijon, Pierre Ebler, Jana Munson, Katherine M. Sorensen, Melanie Sulovari, Arvis Haukness, Marina Ghareghani, Maryam Lansdorp, Peter M. Paten, Benedict Devine, Scott E. Sanders, Ashley D. Lee, Charles Chaisson, Mark J. P. Korbel, Jan O. Eichler, Evan E. Marschall, Tobias |
author_facet | Porubsky, David Ebert, Peter Audano, Peter A. Vollger, Mitchell R. Harvey, William T. Marijon, Pierre Ebler, Jana Munson, Katherine M. Sorensen, Melanie Sulovari, Arvis Haukness, Marina Ghareghani, Maryam Lansdorp, Peter M. Paten, Benedict Devine, Scott E. Sanders, Ashley D. Lee, Charles Chaisson, Mark J. P. Korbel, Jan O. Eichler, Evan E. Marschall, Tobias |
author_sort | Porubsky, David |
collection | PubMed |
description | Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing(1,2) with continuous long-read or high-fidelity(3) sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms. |
format | Online Article Text |
id | pubmed-7954704 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group US |
record_format | MEDLINE/PubMed |
spelling | pubmed-79547042021-03-28 Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads Porubsky, David Ebert, Peter Audano, Peter A. Vollger, Mitchell R. Harvey, William T. Marijon, Pierre Ebler, Jana Munson, Katherine M. Sorensen, Melanie Sulovari, Arvis Haukness, Marina Ghareghani, Maryam Lansdorp, Peter M. Paten, Benedict Devine, Scott E. Sanders, Ashley D. Lee, Charles Chaisson, Mark J. P. Korbel, Jan O. Eichler, Evan E. Marschall, Tobias Nat Biotechnol Letter Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing(1,2) with continuous long-read or high-fidelity(3) sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms. Nature Publishing Group US 2020-12-07 2021 /pmc/articles/PMC7954704/ /pubmed/33288906 http://dx.doi.org/10.1038/s41587-020-0719-5 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Letter Porubsky, David Ebert, Peter Audano, Peter A. Vollger, Mitchell R. Harvey, William T. Marijon, Pierre Ebler, Jana Munson, Katherine M. Sorensen, Melanie Sulovari, Arvis Haukness, Marina Ghareghani, Maryam Lansdorp, Peter M. Paten, Benedict Devine, Scott E. Sanders, Ashley D. Lee, Charles Chaisson, Mark J. P. Korbel, Jan O. Eichler, Evan E. Marschall, Tobias Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads |
title | Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads |
title_full | Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads |
title_fullStr | Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads |
title_full_unstemmed | Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads |
title_short | Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads |
title_sort | fully phased human genome assembly without parental data using single-cell strand sequencing and long reads |
topic | Letter |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7954704/ https://www.ncbi.nlm.nih.gov/pubmed/33288906 http://dx.doi.org/10.1038/s41587-020-0719-5 |
work_keys_str_mv | AT porubskydavid fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT ebertpeter fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT audanopetera fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT vollgermitchellr fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT harveywilliamt fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT marijonpierre fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT eblerjana fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT munsonkatherinem fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT sorensenmelanie fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT sulovariarvis fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT hauknessmarina fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT ghareghanimaryam fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT lansdorppeterm fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT patenbenedict fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT devinescotte fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT sandersashleyd fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT leecharles fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT chaissonmarkjp fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT korbeljano fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT eichlerevane fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads AT marschalltobias fullyphasedhumangenomeassemblywithoutparentaldatausingsinglecellstrandsequencingandlongreads |