Cargando…
A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions
The current human reference genome (GRCh38), with its superior quality, has contributed significantly to genome analysis. However, GRCh38 may still underrepresent the ethnic genome, specifically for Asians, though exactly what we are missing is still elusive. Here, we juxtaposed GRCh38 with a high-c...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7697454/ https://www.ncbi.nlm.nih.gov/pubmed/33202901 http://dx.doi.org/10.3390/genes11111350 |
_version_ | 1783615604448559104 |
---|---|
author | Kim, Jina Sung, Joohon Han, Kyudong Lee, Wooseok Mun, Seyoung Lee, Jooyeon Bahk, Kunhyung Yang, Inchul Bae, Young-Kyung Kim, Changhoon Kim, Jong-Il Seo, Jeong-Sun |
author_facet | Kim, Jina Sung, Joohon Han, Kyudong Lee, Wooseok Mun, Seyoung Lee, Jooyeon Bahk, Kunhyung Yang, Inchul Bae, Young-Kyung Kim, Changhoon Kim, Jong-Il Seo, Jeong-Sun |
author_sort | Kim, Jina |
collection | PubMed |
description | The current human reference genome (GRCh38), with its superior quality, has contributed significantly to genome analysis. However, GRCh38 may still underrepresent the ethnic genome, specifically for Asians, though exactly what we are missing is still elusive. Here, we juxtaposed GRCh38 with a high-contiguity genome assembly of one Korean (AK1) to show that a part of AK1 genome is missing in GRCh38 and that the missing regions harbored ~1390 putative coding elements. Furthermore, we found that multiple populations shared some certain parts in the missing genome when we analyzed the “unmapped” (to GRCh38) reads of fourteen individuals (five East-Asians, four Europeans, and five Africans), amounting to ~5.3 Mb (~0.2% of AK1) of the total genomic regions. The recovered AK1 regions from the “unmapped reads”, which were the estimated missing regions that did not exist in GRCh38, harbored candidate coding elements. We verified that most of the common (shared by ≥7 individuals) missing regions exist in human and chimpanzee DNA. Moreover, we further identified the occurrence mechanism and ethnic heterogeneity as well as the presence of the common missing regions. This study illuminates a potential advantage of using a pangenome reference and brings up the need for further investigations on the various features of regions globally missed in GRCh38. |
format | Online Article Text |
id | pubmed-7697454 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-76974542020-11-29 A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions Kim, Jina Sung, Joohon Han, Kyudong Lee, Wooseok Mun, Seyoung Lee, Jooyeon Bahk, Kunhyung Yang, Inchul Bae, Young-Kyung Kim, Changhoon Kim, Jong-Il Seo, Jeong-Sun Genes (Basel) Article The current human reference genome (GRCh38), with its superior quality, has contributed significantly to genome analysis. However, GRCh38 may still underrepresent the ethnic genome, specifically for Asians, though exactly what we are missing is still elusive. Here, we juxtaposed GRCh38 with a high-contiguity genome assembly of one Korean (AK1) to show that a part of AK1 genome is missing in GRCh38 and that the missing regions harbored ~1390 putative coding elements. Furthermore, we found that multiple populations shared some certain parts in the missing genome when we analyzed the “unmapped” (to GRCh38) reads of fourteen individuals (five East-Asians, four Europeans, and five Africans), amounting to ~5.3 Mb (~0.2% of AK1) of the total genomic regions. The recovered AK1 regions from the “unmapped reads”, which were the estimated missing regions that did not exist in GRCh38, harbored candidate coding elements. We verified that most of the common (shared by ≥7 individuals) missing regions exist in human and chimpanzee DNA. Moreover, we further identified the occurrence mechanism and ethnic heterogeneity as well as the presence of the common missing regions. This study illuminates a potential advantage of using a pangenome reference and brings up the need for further investigations on the various features of regions globally missed in GRCh38. MDPI 2020-11-13 /pmc/articles/PMC7697454/ /pubmed/33202901 http://dx.doi.org/10.3390/genes11111350 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Kim, Jina Sung, Joohon Han, Kyudong Lee, Wooseok Mun, Seyoung Lee, Jooyeon Bahk, Kunhyung Yang, Inchul Bae, Young-Kyung Kim, Changhoon Kim, Jong-Il Seo, Jeong-Sun A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions |
title | A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions |
title_full | A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions |
title_fullStr | A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions |
title_full_unstemmed | A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions |
title_short | A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions |
title_sort | high quality asian genome assembly identifies features of common missing regions |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7697454/ https://www.ncbi.nlm.nih.gov/pubmed/33202901 http://dx.doi.org/10.3390/genes11111350 |
work_keys_str_mv | AT kimjina ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT sungjoohon ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT hankyudong ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT leewooseok ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT munseyoung ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT leejooyeon ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT bahkkunhyung ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT yanginchul ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT baeyoungkyung ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT kimchanghoon ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT kimjongil ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT seojeongsun ahighqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT kimjina highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT sungjoohon highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT hankyudong highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT leewooseok highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT munseyoung highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT leejooyeon highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT bahkkunhyung highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT yanginchul highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT baeyoungkyung highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT kimchanghoon highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT kimjongil highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions AT seojeongsun highqualityasiangenomeassemblyidentifiesfeaturesofcommonmissingregions |