Cargando…

A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions

The current human reference genome (GRCh38), with its superior quality, has contributed significantly to genome analysis. However, GRCh38 may still underrepresent the ethnic genome, specifically for Asians, though exactly what we are missing is still elusive. Here, we juxtaposed GRCh38 with a high-c...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Jina, Sung, Joohon, Han, Kyudong, Lee, Wooseok, Mun, Seyoung, Lee, Jooyeon, Bahk, Kunhyung, Yang, Inchul, Bae, Young-Kyung, Kim, Changhoon, Kim, Jong-Il, Seo, Jeong-Sun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7697454/
https://www.ncbi.nlm.nih.gov/pubmed/33202901
http://dx.doi.org/10.3390/genes11111350
_version_ 1783615604448559104
author Kim, Jina
Sung, Joohon
Han, Kyudong
Lee, Wooseok
Mun, Seyoung
Lee, Jooyeon
Bahk, Kunhyung
Yang, Inchul
Bae, Young-Kyung
Kim, Changhoon
Kim, Jong-Il
Seo, Jeong-Sun
author_facet Kim, Jina
Sung, Joohon
Han, Kyudong
Lee, Wooseok
Mun, Seyoung
Lee, Jooyeon
Bahk, Kunhyung
Yang, Inchul
Bae, Young-Kyung
Kim, Changhoon
Kim, Jong-Il
Seo, Jeong-Sun
author_sort Kim, Jina
collection PubMed
description The current human reference genome (GRCh38), with its superior quality, has contributed significantly to genome analysis. However, GRCh38 may still underrepresent the ethnic genome, specifically for Asians, though exactly what we are missing is still elusive. Here, we juxtaposed GRCh38 with a high-contiguity genome assembly of one Korean (AK1) to show that a part of AK1 genome is missing in GRCh38 and that the missing regions harbored ~1390 putative coding elements. Furthermore, we found that multiple populations shared some certain parts in the missing genome when we analyzed the “unmapped” (to GRCh38) reads of fourteen individuals (five East-Asians, four Europeans, and five Africans), amounting to ~5.3 Mb (~0.2% of AK1) of the total genomic regions. The recovered AK1 regions from the “unmapped reads”, which were the estimated missing regions that did not exist in GRCh38, harbored candidate coding elements. We verified that most of the common (shared by ≥7 individuals) missing regions exist in human and chimpanzee DNA. Moreover, we further identified the occurrence mechanism and ethnic heterogeneity as well as the presence of the common missing regions. This study illuminates a potential advantage of using a pangenome reference and brings up the need for further investigations on the various features of regions globally missed in GRCh38.
format Online
Article
Text
id pubmed-7697454
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-76974542020-11-29 A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions Kim, Jina Sung, Joohon Han, Kyudong Lee, Wooseok Mun, Seyoung Lee, Jooyeon Bahk, Kunhyung Yang, Inchul Bae, Young-Kyung Kim, Changhoon Kim, Jong-Il Seo, Jeong-Sun Genes (Basel) Article The current human reference genome (GRCh38), with its superior quality, has contributed significantly to genome analysis. However, GRCh38 may still underrepresent the ethnic genome, specifically for Asians, though exactly what we are missing is still elusive. Here, we juxtaposed GRCh38 with a high-contiguity genome assembly of one Korean (AK1) to show that a part of AK1 genome is missing in GRCh38 and that the missing regions harbored ~1390 putative coding elements. Furthermore, we found that multiple populations shared some certain parts in the missing genome when we analyzed the “unmapped” (to GRCh38) reads of fourteen individuals (five East-Asians, four Europeans, and five Africans), amounting to ~5.3 Mb (~0.2% of AK1) of the total genomic regions. The recovered AK1 regions from the “unmapped reads”, which were the estimated missing regions that did not exist in GRCh38, harbored candidate coding elements. We verified that most of the common (shared by ≥7 individuals) missing regions exist in human and chimpanzee DNA. Moreover, we further identified the occurrence mechanism and ethnic heterogeneity as well as the presence of the common missing regions. This study illuminates a potential advantage of using a pangenome reference and brings up the need for further investigations on the various features of regions globally missed in GRCh38. MDPI 2020-11-13 /pmc/articles/PMC7697454/ /pubmed/33202901 http://dx.doi.org/10.3390/genes11111350 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kim, Jina
Sung, Joohon
Han, Kyudong
Lee, Wooseok
Mun, Seyoung
Lee, Jooyeon
Bahk, Kunhyung
Yang, Inchul
Bae, Young-Kyung
Kim, Changhoon
Kim, Jong-Il
Seo, Jeong-Sun
A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions
title A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions
title_full A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions
title_fullStr A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions
title_full_unstemmed A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions
title_short A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions
title_sort high quality asian genome assembly identifies features of common missing regions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7697454/
https://www.ncbi.nlm.nih.gov/pubmed/33202901
http://dx.doi.org/10.3390/genes11111350
work_keys_str_mv AT kimjina ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT sungjoohon ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT hankyudong ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT leewooseok ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT munseyoung ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT leejooyeon ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT bahkkunhyung ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT yanginchul ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT baeyoungkyung ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT kimchanghoon ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT kimjongil ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT seojeongsun ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT kimjina highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT sungjoohon highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT hankyudong highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT leewooseok highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT munseyoung highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT leejooyeon highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT bahkkunhyung highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT yanginchul highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT baeyoungkyung highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT kimchanghoon highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT kimjongil highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions
AT seojeongsun highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions