Cargando…

LT1, an ONT long-read-based assembly scaffolded with Hi-C data and polished with short reads

We present LT1, the first high-quality human reference genome from the Baltic States. LT1 is a female de novo human reference genome assembly, constructed using 57× nanopore long reads and polished using 47× short paired-end reads. We utilized 72 GB of Hi-C chromosomal mapping data for scaffolding,...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Hui-Su, Blazyte, Asta, Jeon, Sungwon, Yoon, Changhan, Kim, Yeonkyung, Kim, Changjae, Bolser, Dan, Ahn, Ji-Hye, Edwards, Jeremy S., Bhak, Jong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: GigaScience Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9650228/
https://www.ncbi.nlm.nih.gov/pubmed/36824523
http://dx.doi.org/10.46471/gigabyte.51
_version_ 1784827965491642368
author Kim, Hui-Su
Blazyte, Asta
Jeon, Sungwon
Yoon, Changhan
Kim, Yeonkyung
Kim, Changjae
Bolser, Dan
Ahn, Ji-Hye
Edwards, Jeremy S.
Bhak, Jong
author_facet Kim, Hui-Su
Blazyte, Asta
Jeon, Sungwon
Yoon, Changhan
Kim, Yeonkyung
Kim, Changjae
Bolser, Dan
Ahn, Ji-Hye
Edwards, Jeremy S.
Bhak, Jong
author_sort Kim, Hui-Su
collection PubMed
description We present LT1, the first high-quality human reference genome from the Baltic States. LT1 is a female de novo human reference genome assembly, constructed using 57× nanopore long reads and polished using 47× short paired-end reads. We utilized 72 GB of Hi-C chromosomal mapping data for scaffolding, to maximize assembly contiguity and accuracy. The contig assembly of LT1 was 2.73 Gbp in length, comprising 4490 contigs with an NG50 value of 12.0 Mbp. After scaffolding with Hi-C data and manual curation, the final assembly has an NG50 value of 137 Mbp and 4699 scaffolds. Assessment of gene prediction quality using Benchmarking Universal Single-Copy Orthologs (BUSCO) identified 89.3% of the single-copy orthologous genes included in the benchmark. Detailed characterization of LT1 suggests it has 73,744 predicted transcripts, 4.2 million autosomal SNPs, 974,616 short indels, and 12,079 large structural variants. These data may be used as a benchmark for further in-depth genomic analyses of Baltic populations.
format Online
Article
Text
id pubmed-9650228
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher GigaScience Press
record_format MEDLINE/PubMed
spelling pubmed-96502282023-02-22 LT1, an ONT long-read-based assembly scaffolded with Hi-C data and polished with short reads Kim, Hui-Su Blazyte, Asta Jeon, Sungwon Yoon, Changhan Kim, Yeonkyung Kim, Changjae Bolser, Dan Ahn, Ji-Hye Edwards, Jeremy S. Bhak, Jong GigaByte Data Release We present LT1, the first high-quality human reference genome from the Baltic States. LT1 is a female de novo human reference genome assembly, constructed using 57× nanopore long reads and polished using 47× short paired-end reads. We utilized 72 GB of Hi-C chromosomal mapping data for scaffolding, to maximize assembly contiguity and accuracy. The contig assembly of LT1 was 2.73 Gbp in length, comprising 4490 contigs with an NG50 value of 12.0 Mbp. After scaffolding with Hi-C data and manual curation, the final assembly has an NG50 value of 137 Mbp and 4699 scaffolds. Assessment of gene prediction quality using Benchmarking Universal Single-Copy Orthologs (BUSCO) identified 89.3% of the single-copy orthologous genes included in the benchmark. Detailed characterization of LT1 suggests it has 73,744 predicted transcripts, 4.2 million autosomal SNPs, 974,616 short indels, and 12,079 large structural variants. These data may be used as a benchmark for further in-depth genomic analyses of Baltic populations. GigaScience Press 2022-05-04 /pmc/articles/PMC9650228/ /pubmed/36824523 http://dx.doi.org/10.46471/gigabyte.51 Text en © The Author(s) 2022. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Data Release
Kim, Hui-Su
Blazyte, Asta
Jeon, Sungwon
Yoon, Changhan
Kim, Yeonkyung
Kim, Changjae
Bolser, Dan
Ahn, Ji-Hye
Edwards, Jeremy S.
Bhak, Jong
LT1, an ONT long-read-based assembly scaffolded with Hi-C data and polished with short reads
title LT1, an ONT long-read-based assembly scaffolded with Hi-C data and polished with short reads
title_full LT1, an ONT long-read-based assembly scaffolded with Hi-C data and polished with short reads
title_fullStr LT1, an ONT long-read-based assembly scaffolded with Hi-C data and polished with short reads
title_full_unstemmed LT1, an ONT long-read-based assembly scaffolded with Hi-C data and polished with short reads
title_short LT1, an ONT long-read-based assembly scaffolded with Hi-C data and polished with short reads
title_sort lt1, an ont long-read-based assembly scaffolded with hi-c data and polished with short reads
topic Data Release
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9650228/
https://www.ncbi.nlm.nih.gov/pubmed/36824523
http://dx.doi.org/10.46471/gigabyte.51
work_keys_str_mv AT kimhuisu lt1anontlongreadbasedassemblyscaffoldedwithhicdataandpolishedwithshortreads
AT blazyteasta lt1anontlongreadbasedassemblyscaffoldedwithhicdataandpolishedwithshortreads
AT jeonsungwon lt1anontlongreadbasedassemblyscaffoldedwithhicdataandpolishedwithshortreads
AT yoonchanghan lt1anontlongreadbasedassemblyscaffoldedwithhicdataandpolishedwithshortreads
AT kimyeonkyung lt1anontlongreadbasedassemblyscaffoldedwithhicdataandpolishedwithshortreads
AT kimchangjae lt1anontlongreadbasedassemblyscaffoldedwithhicdataandpolishedwithshortreads
AT bolserdan lt1anontlongreadbasedassemblyscaffoldedwithhicdataandpolishedwithshortreads
AT ahnjihye lt1anontlongreadbasedassemblyscaffoldedwithhicdataandpolishedwithshortreads
AT edwardsjeremys lt1anontlongreadbasedassemblyscaffoldedwithhicdataandpolishedwithshortreads
AT bhakjong lt1anontlongreadbasedassemblyscaffoldedwithhicdataandpolishedwithshortreads