Cargando…

JTK: targeted diploid genome assembler

MOTIVATION: Diploid assembly, or determining sequences of homologous chromosomes separately, is essential to elucidate genetic differences between haplotypes. One approach is to call and phase single nucleotide variants (SNVs) on a reference sequence. However, this approach becomes unstable on large...

Descripción completa

Detalles Bibliográficos
Autores principales:	Masutani, Bansho, Suzuki, Yoshihiko, Suzuki, Yuta, Morishita, Shinichi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10320103/ https://www.ncbi.nlm.nih.gov/pubmed/37354526 http://dx.doi.org/10.1093/bioinformatics/btad398

_version_	1785068378338099200
author	Masutani, Bansho Suzuki, Yoshihiko Suzuki, Yuta Morishita, Shinichi
author_facet	Masutani, Bansho Suzuki, Yoshihiko Suzuki, Yuta Morishita, Shinichi
author_sort	Masutani, Bansho
collection	PubMed
description	MOTIVATION: Diploid assembly, or determining sequences of homologous chromosomes separately, is essential to elucidate genetic differences between haplotypes. One approach is to call and phase single nucleotide variants (SNVs) on a reference sequence. However, this approach becomes unstable on large segmental duplications (SDs) or structural variations (SVs) because the alignments of reads deriving from these regions tend to be unreliable. Another approach is to use highly accurate PacBio HiFi reads to output diploid assembly directly. Nonetheless, HiFi reads cannot phase homozygous regions longer than their length and require oxford nanopore technology (ONT) reads or Hi-C to produce a fully phased assembly. Is a single long-read sequencing technology sufficient to create an accurate diploid assembly? RESULTS: Here, we present JTK, a megabase-scale diploid genome assembler. It first randomly samples kilobase-scale sequences (called ‘chunks’) from the long reads, phases variants found on them, and produces two haplotypes. The novel idea of JTK is to utilize chunks to capture SNVs and SVs simultaneously. From 60-fold ONT reads on the HG002 and a Japanese sample, it fully assembled two haplotypes with approximately 99.9% accuracy on the histocompatibility complex (MHC) and the leukocyte receptor complex (LRC) regions, which was impossible by the reference-based approach. In addition, in the LRC region on a Japanese sample, JTK output an assembly of better contiguity than those built from high-coverage HiFi+Hi-C. In the coming age of pan-genomics, JTK would complement the reference-based phasing method to assemble the difficult-to-assemble but medically important regions. AVAILABILITY AND IMPLEMENTATION: JTK is available at https://github.com/ban-m/jtk, and the datasets are available at https://doi.org/10.5281/zenodo.7790310 or JGAS000580 in DDBJ.
format	Online Article Text
id	pubmed-10320103
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-103201032023-07-06 JTK: targeted diploid genome assembler Masutani, Bansho Suzuki, Yoshihiko Suzuki, Yuta Morishita, Shinichi Bioinformatics Original Paper MOTIVATION: Diploid assembly, or determining sequences of homologous chromosomes separately, is essential to elucidate genetic differences between haplotypes. One approach is to call and phase single nucleotide variants (SNVs) on a reference sequence. However, this approach becomes unstable on large segmental duplications (SDs) or structural variations (SVs) because the alignments of reads deriving from these regions tend to be unreliable. Another approach is to use highly accurate PacBio HiFi reads to output diploid assembly directly. Nonetheless, HiFi reads cannot phase homozygous regions longer than their length and require oxford nanopore technology (ONT) reads or Hi-C to produce a fully phased assembly. Is a single long-read sequencing technology sufficient to create an accurate diploid assembly? RESULTS: Here, we present JTK, a megabase-scale diploid genome assembler. It first randomly samples kilobase-scale sequences (called ‘chunks’) from the long reads, phases variants found on them, and produces two haplotypes. The novel idea of JTK is to utilize chunks to capture SNVs and SVs simultaneously. From 60-fold ONT reads on the HG002 and a Japanese sample, it fully assembled two haplotypes with approximately 99.9% accuracy on the histocompatibility complex (MHC) and the leukocyte receptor complex (LRC) regions, which was impossible by the reference-based approach. In addition, in the LRC region on a Japanese sample, JTK output an assembly of better contiguity than those built from high-coverage HiFi+Hi-C. In the coming age of pan-genomics, JTK would complement the reference-based phasing method to assemble the difficult-to-assemble but medically important regions. AVAILABILITY AND IMPLEMENTATION: JTK is available at https://github.com/ban-m/jtk, and the datasets are available at https://doi.org/10.5281/zenodo.7790310 or JGAS000580 in DDBJ. Oxford University Press 2023-06-24 /pmc/articles/PMC10320103/ /pubmed/37354526 http://dx.doi.org/10.1093/bioinformatics/btad398 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Paper Masutani, Bansho Suzuki, Yoshihiko Suzuki, Yuta Morishita, Shinichi JTK: targeted diploid genome assembler
title	JTK: targeted diploid genome assembler
title_full	JTK: targeted diploid genome assembler
title_fullStr	JTK: targeted diploid genome assembler
title_full_unstemmed	JTK: targeted diploid genome assembler
title_short	JTK: targeted diploid genome assembler
title_sort	jtk: targeted diploid genome assembler
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10320103/ https://www.ncbi.nlm.nih.gov/pubmed/37354526 http://dx.doi.org/10.1093/bioinformatics/btad398
work_keys_str_mv	AT masutanibansho jtktargeteddiploidgenomeassembler AT suzukiyoshihiko jtktargeteddiploidgenomeassembler AT suzukiyuta jtktargeteddiploidgenomeassembler AT morishitashinichi jtktargeteddiploidgenomeassembler

JTK: targeted diploid genome assembler

Ejemplares similares