Cargando…

KOREF_S1: phased, parental trio-binned Korean reference genome using long reads and Hi-C sequencing methods

BACKGROUND: KOREF is the Korean reference genome, which was constructed with various sequencing technologies including long reads, short reads, and optical mapping methods. It is also the first East Asian multiomic reference genome accompanied by extensive clinical information, time-series and multi...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Hui-su, Jeon, Sungwon, Kim, Yeonkyung, Kim, Changjae, Bhak, Jihun, Bhak, Jong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8952264/
https://www.ncbi.nlm.nih.gov/pubmed/35333300
http://dx.doi.org/10.1093/gigascience/giac022
_version_ 1784675572237991936
author Kim, Hui-su
Jeon, Sungwon
Kim, Yeonkyung
Kim, Changjae
Bhak, Jihun
Bhak, Jong
author_facet Kim, Hui-su
Jeon, Sungwon
Kim, Yeonkyung
Kim, Changjae
Bhak, Jihun
Bhak, Jong
author_sort Kim, Hui-su
collection PubMed
description BACKGROUND: KOREF is the Korean reference genome, which was constructed with various sequencing technologies including long reads, short reads, and optical mapping methods. It is also the first East Asian multiomic reference genome accompanied by extensive clinical information, time-series and multiomic data, and parental sequencing data. However, it was still not a chromosome-scale reference. Here, we updated the previous KOREF assembly to a new chromosome-level haploid assembly of KOREF, KOREF_S1v2.1. Oxford Nanopore Technologies (ONT) PromethION, Pacific Biosciences HiFi-CCS, and Hi-C technology were used to build the most accurate East Asian reference assembled so far. RESULTS: We produced 705 Gb ONT reads and 114 Gb Pacific Biosciences HiFi reads, and corrected ONT reads by Pacific Biosciences reads. The corrected ultra-long reads reached higher accuracy of 1.4% base errors than the previous KOREF_S1v1.0, which was mainly built with short reads. KOREF has parental genome information, and we successfully phased it using a trio-binning method, acquiring a near-complete haploid-assembly. The final assembly resulted in total length of 2.9 Gb with an N50 of 150 Mb, and the longest scaffold covered 97.3% of GRCh38’s chromosome 2. In addition, the final assembly showed high base accuracy, with <0.01% base errors. CONCLUSIONS: KOREF_S1v2.1 is the first chromosome-scale haploid assembly of the Korean reference genome with high contiguity and accuracy. Our study provides useful resources of the Korean reference genome and demonstrates a new strategy of hybrid assembly that combines ONT's PromethION and PacBio's HiFi-CCS.
format Online
Article
Text
id pubmed-8952264
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-89522642022-03-28 KOREF_S1: phased, parental trio-binned Korean reference genome using long reads and Hi-C sequencing methods Kim, Hui-su Jeon, Sungwon Kim, Yeonkyung Kim, Changjae Bhak, Jihun Bhak, Jong Gigascience Data Note BACKGROUND: KOREF is the Korean reference genome, which was constructed with various sequencing technologies including long reads, short reads, and optical mapping methods. It is also the first East Asian multiomic reference genome accompanied by extensive clinical information, time-series and multiomic data, and parental sequencing data. However, it was still not a chromosome-scale reference. Here, we updated the previous KOREF assembly to a new chromosome-level haploid assembly of KOREF, KOREF_S1v2.1. Oxford Nanopore Technologies (ONT) PromethION, Pacific Biosciences HiFi-CCS, and Hi-C technology were used to build the most accurate East Asian reference assembled so far. RESULTS: We produced 705 Gb ONT reads and 114 Gb Pacific Biosciences HiFi reads, and corrected ONT reads by Pacific Biosciences reads. The corrected ultra-long reads reached higher accuracy of 1.4% base errors than the previous KOREF_S1v1.0, which was mainly built with short reads. KOREF has parental genome information, and we successfully phased it using a trio-binning method, acquiring a near-complete haploid-assembly. The final assembly resulted in total length of 2.9 Gb with an N50 of 150 Mb, and the longest scaffold covered 97.3% of GRCh38’s chromosome 2. In addition, the final assembly showed high base accuracy, with <0.01% base errors. CONCLUSIONS: KOREF_S1v2.1 is the first chromosome-scale haploid assembly of the Korean reference genome with high contiguity and accuracy. Our study provides useful resources of the Korean reference genome and demonstrates a new strategy of hybrid assembly that combines ONT's PromethION and PacBio's HiFi-CCS. Oxford University Press 2022-03-24 /pmc/articles/PMC8952264/ /pubmed/35333300 http://dx.doi.org/10.1093/gigascience/giac022 Text en © The Author(s) 2022. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Data Note
Kim, Hui-su
Jeon, Sungwon
Kim, Yeonkyung
Kim, Changjae
Bhak, Jihun
Bhak, Jong
KOREF_S1: phased, parental trio-binned Korean reference genome using long reads and Hi-C sequencing methods
title KOREF_S1: phased, parental trio-binned Korean reference genome using long reads and Hi-C sequencing methods
title_full KOREF_S1: phased, parental trio-binned Korean reference genome using long reads and Hi-C sequencing methods
title_fullStr KOREF_S1: phased, parental trio-binned Korean reference genome using long reads and Hi-C sequencing methods
title_full_unstemmed KOREF_S1: phased, parental trio-binned Korean reference genome using long reads and Hi-C sequencing methods
title_short KOREF_S1: phased, parental trio-binned Korean reference genome using long reads and Hi-C sequencing methods
title_sort koref_s1: phased, parental trio-binned korean reference genome using long reads and hi-c sequencing methods
topic Data Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8952264/
https://www.ncbi.nlm.nih.gov/pubmed/35333300
http://dx.doi.org/10.1093/gigascience/giac022
work_keys_str_mv AT kimhuisu korefs1phasedparentaltriobinnedkoreanreferencegenomeusinglongreadsandhicsequencingmethods
AT jeonsungwon korefs1phasedparentaltriobinnedkoreanreferencegenomeusinglongreadsandhicsequencingmethods
AT kimyeonkyung korefs1phasedparentaltriobinnedkoreanreferencegenomeusinglongreadsandhicsequencingmethods
AT kimchangjae korefs1phasedparentaltriobinnedkoreanreferencegenomeusinglongreadsandhicsequencingmethods
AT bhakjihun korefs1phasedparentaltriobinnedkoreanreferencegenomeusinglongreadsandhicsequencingmethods
AT bhakjong korefs1phasedparentaltriobinnedkoreanreferencegenomeusinglongreadsandhicsequencingmethods