Cargando…

Targeted de novo phasing and long-range assembly by template mutagenesis

Short-read sequencers provide highly accurate reads at very low cost. Unfortunately, short reads are often inadequate for important applications such as assembly in complex regions or phasing across distant heterozygous sites. In this study, we describe novel bench protocols and algorithms to obtain...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Siran, Park, Sarah, Ye, Catherine, Danyko, Cassidy, Wroten, Matthew, Andrews, Peter, Wigler, Michael, Levy, Dan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9561374/
https://www.ncbi.nlm.nih.gov/pubmed/35822882
http://dx.doi.org/10.1093/nar/gkac592
_version_ 1784807938973499392
author Li, Siran
Park, Sarah
Ye, Catherine
Danyko, Cassidy
Wroten, Matthew
Andrews, Peter
Wigler, Michael
Levy, Dan
author_facet Li, Siran
Park, Sarah
Ye, Catherine
Danyko, Cassidy
Wroten, Matthew
Andrews, Peter
Wigler, Michael
Levy, Dan
author_sort Li, Siran
collection PubMed
description Short-read sequencers provide highly accurate reads at very low cost. Unfortunately, short reads are often inadequate for important applications such as assembly in complex regions or phasing across distant heterozygous sites. In this study, we describe novel bench protocols and algorithms to obtain haplotype-phased sequence assemblies with ultra-low error for regions 10 kb and longer using short reads only. We accomplish this by imprinting each template strand from a target region with a dense and unique mutation pattern. The mutation process randomly and independently converts ∼50% of cytosines to uracils. Sequencing libraries are made from both mutated and unmutated templates. Using de Bruijn graphs and paired-end read information, we assemble each mutated template and use the unmutated library to correct the mutated bases. Templates are partitioned into two or more haplotypes, and the final haplotypes are assembled and corrected for residual template mutations and PCR errors. With sufficient template coverage, the final assemblies have per-base error rates below 10(–9). We demonstrate this method on a four-member nuclear family, correctly assembling and phasing three genomic intervals, including the highly polymorphic HLA-B gene.
format Online
Article
Text
id pubmed-9561374
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-95613742022-10-18 Targeted de novo phasing and long-range assembly by template mutagenesis Li, Siran Park, Sarah Ye, Catherine Danyko, Cassidy Wroten, Matthew Andrews, Peter Wigler, Michael Levy, Dan Nucleic Acids Res Methods Online Short-read sequencers provide highly accurate reads at very low cost. Unfortunately, short reads are often inadequate for important applications such as assembly in complex regions or phasing across distant heterozygous sites. In this study, we describe novel bench protocols and algorithms to obtain haplotype-phased sequence assemblies with ultra-low error for regions 10 kb and longer using short reads only. We accomplish this by imprinting each template strand from a target region with a dense and unique mutation pattern. The mutation process randomly and independently converts ∼50% of cytosines to uracils. Sequencing libraries are made from both mutated and unmutated templates. Using de Bruijn graphs and paired-end read information, we assemble each mutated template and use the unmutated library to correct the mutated bases. Templates are partitioned into two or more haplotypes, and the final haplotypes are assembled and corrected for residual template mutations and PCR errors. With sufficient template coverage, the final assemblies have per-base error rates below 10(–9). We demonstrate this method on a four-member nuclear family, correctly assembling and phasing three genomic intervals, including the highly polymorphic HLA-B gene. Oxford University Press 2022-07-13 /pmc/articles/PMC9561374/ /pubmed/35822882 http://dx.doi.org/10.1093/nar/gkac592 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Li, Siran
Park, Sarah
Ye, Catherine
Danyko, Cassidy
Wroten, Matthew
Andrews, Peter
Wigler, Michael
Levy, Dan
Targeted de novo phasing and long-range assembly by template mutagenesis
title Targeted de novo phasing and long-range assembly by template mutagenesis
title_full Targeted de novo phasing and long-range assembly by template mutagenesis
title_fullStr Targeted de novo phasing and long-range assembly by template mutagenesis
title_full_unstemmed Targeted de novo phasing and long-range assembly by template mutagenesis
title_short Targeted de novo phasing and long-range assembly by template mutagenesis
title_sort targeted de novo phasing and long-range assembly by template mutagenesis
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9561374/
https://www.ncbi.nlm.nih.gov/pubmed/35822882
http://dx.doi.org/10.1093/nar/gkac592
work_keys_str_mv AT lisiran targeteddenovophasingandlongrangeassemblybytemplatemutagenesis
AT parksarah targeteddenovophasingandlongrangeassemblybytemplatemutagenesis
AT yecatherine targeteddenovophasingandlongrangeassemblybytemplatemutagenesis
AT danykocassidy targeteddenovophasingandlongrangeassemblybytemplatemutagenesis
AT wrotenmatthew targeteddenovophasingandlongrangeassemblybytemplatemutagenesis
AT andrewspeter targeteddenovophasingandlongrangeassemblybytemplatemutagenesis
AT wiglermichael targeteddenovophasingandlongrangeassemblybytemplatemutagenesis
AT levydan targeteddenovophasingandlongrangeassemblybytemplatemutagenesis