Cargando…

Building a Chinese pan-genome of 486 individuals

Pan-genome sequence analysis of human population ancestry is critical for expanding and better defining human genome sequence diversity. However, the amount of genetic variation still missing from current human reference sequences is still unknown. Here, we used 486 deep-sequenced Han Chinese genome...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Qiuhui, Tian, Shilin, Yan, Bin, Liu, Chi Man, Lam, Tak-Wah, Li, Ruiqiang, Luo, Ruibang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8405635/
https://www.ncbi.nlm.nih.gov/pubmed/34462542
http://dx.doi.org/10.1038/s42003-021-02556-6
_version_ 1783746368078086144
author Li, Qiuhui
Tian, Shilin
Yan, Bin
Liu, Chi Man
Lam, Tak-Wah
Li, Ruiqiang
Luo, Ruibang
author_facet Li, Qiuhui
Tian, Shilin
Yan, Bin
Liu, Chi Man
Lam, Tak-Wah
Li, Ruiqiang
Luo, Ruibang
author_sort Li, Qiuhui
collection PubMed
description Pan-genome sequence analysis of human population ancestry is critical for expanding and better defining human genome sequence diversity. However, the amount of genetic variation still missing from current human reference sequences is still unknown. Here, we used 486 deep-sequenced Han Chinese genomes to identify 276 Mbp of DNA sequences that, to our knowledge, are absent in the current human reference. We classified these sequences into individual-specific and common sequences, and propose that the common sequence size is uncapped with a growing population. The 46.646 Mbp common sequences obtained from the 486 individuals improved the accuracy of variant calling and mapping rate when added to the reference genome. We also analyzed the genomic positions of these common sequences and found that they came from genomic regions characterized by high mutation rate and low pathogenicity. Our study authenticates the Chinese pan-genome as representative of DNA sequences specific to the Han Chinese population missing from the GRCh38 reference genome and establishes the newly defined common sequences as candidates to supplement the current human reference.
format Online
Article
Text
id pubmed-8405635
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-84056352021-09-22 Building a Chinese pan-genome of 486 individuals Li, Qiuhui Tian, Shilin Yan, Bin Liu, Chi Man Lam, Tak-Wah Li, Ruiqiang Luo, Ruibang Commun Biol Article Pan-genome sequence analysis of human population ancestry is critical for expanding and better defining human genome sequence diversity. However, the amount of genetic variation still missing from current human reference sequences is still unknown. Here, we used 486 deep-sequenced Han Chinese genomes to identify 276 Mbp of DNA sequences that, to our knowledge, are absent in the current human reference. We classified these sequences into individual-specific and common sequences, and propose that the common sequence size is uncapped with a growing population. The 46.646 Mbp common sequences obtained from the 486 individuals improved the accuracy of variant calling and mapping rate when added to the reference genome. We also analyzed the genomic positions of these common sequences and found that they came from genomic regions characterized by high mutation rate and low pathogenicity. Our study authenticates the Chinese pan-genome as representative of DNA sequences specific to the Han Chinese population missing from the GRCh38 reference genome and establishes the newly defined common sequences as candidates to supplement the current human reference. Nature Publishing Group UK 2021-08-30 /pmc/articles/PMC8405635/ /pubmed/34462542 http://dx.doi.org/10.1038/s42003-021-02556-6 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Li, Qiuhui
Tian, Shilin
Yan, Bin
Liu, Chi Man
Lam, Tak-Wah
Li, Ruiqiang
Luo, Ruibang
Building a Chinese pan-genome of 486 individuals
title Building a Chinese pan-genome of 486 individuals
title_full Building a Chinese pan-genome of 486 individuals
title_fullStr Building a Chinese pan-genome of 486 individuals
title_full_unstemmed Building a Chinese pan-genome of 486 individuals
title_short Building a Chinese pan-genome of 486 individuals
title_sort building a chinese pan-genome of 486 individuals
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8405635/
https://www.ncbi.nlm.nih.gov/pubmed/34462542
http://dx.doi.org/10.1038/s42003-021-02556-6
work_keys_str_mv AT liqiuhui buildingachinesepangenomeof486individuals
AT tianshilin buildingachinesepangenomeof486individuals
AT yanbin buildingachinesepangenomeof486individuals
AT liuchiman buildingachinesepangenomeof486individuals
AT lamtakwah buildingachinesepangenomeof486individuals
AT liruiqiang buildingachinesepangenomeof486individuals
AT luoruibang buildingachinesepangenomeof486individuals