Cargando…

Deep whole-genome sequencing of 90 Han Chinese genomes

Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided...

Descripción completa

Detalles Bibliográficos
Autores principales: Lan, Tianming, Lin, Haoxiang, Zhu, Wenjuan, Laurent, Tellier Christian Asker Melchior, Yang, Mengcheng, Liu, Xin, Wang, Jun, Wang, Jian, Yang, Huanming, Xu, Xun, Guo, Xiaosen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5603764/
https://www.ncbi.nlm.nih.gov/pubmed/28938720
http://dx.doi.org/10.1093/gigascience/gix067
_version_ 1783264766276403200
author Lan, Tianming
Lin, Haoxiang
Zhu, Wenjuan
Laurent, Tellier Christian Asker Melchior
Yang, Mengcheng
Liu, Xin
Wang, Jun
Wang, Jian
Yang, Huanming
Xu, Xun
Guo, Xiaosen
author_facet Lan, Tianming
Lin, Haoxiang
Zhu, Wenjuan
Laurent, Tellier Christian Asker Melchior
Yang, Mengcheng
Liu, Xin
Wang, Jun
Wang, Jian
Yang, Huanming
Xu, Xun
Guo, Xiaosen
author_sort Lan, Tianming
collection PubMed
description Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000 Genomes Project, as well as to other human genome projects.
format Online
Article
Text
id pubmed-5603764
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-56037642017-09-25 Deep whole-genome sequencing of 90 Han Chinese genomes Lan, Tianming Lin, Haoxiang Zhu, Wenjuan Laurent, Tellier Christian Asker Melchior Yang, Mengcheng Liu, Xin Wang, Jun Wang, Jian Yang, Huanming Xu, Xun Guo, Xiaosen Gigascience Data Note Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000 Genomes Project, as well as to other human genome projects. Oxford University Press 2017-07-31 /pmc/articles/PMC5603764/ /pubmed/28938720 http://dx.doi.org/10.1093/gigascience/gix067 Text en © The Authors 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Data Note
Lan, Tianming
Lin, Haoxiang
Zhu, Wenjuan
Laurent, Tellier Christian Asker Melchior
Yang, Mengcheng
Liu, Xin
Wang, Jun
Wang, Jian
Yang, Huanming
Xu, Xun
Guo, Xiaosen
Deep whole-genome sequencing of 90 Han Chinese genomes
title Deep whole-genome sequencing of 90 Han Chinese genomes
title_full Deep whole-genome sequencing of 90 Han Chinese genomes
title_fullStr Deep whole-genome sequencing of 90 Han Chinese genomes
title_full_unstemmed Deep whole-genome sequencing of 90 Han Chinese genomes
title_short Deep whole-genome sequencing of 90 Han Chinese genomes
title_sort deep whole-genome sequencing of 90 han chinese genomes
topic Data Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5603764/
https://www.ncbi.nlm.nih.gov/pubmed/28938720
http://dx.doi.org/10.1093/gigascience/gix067
work_keys_str_mv AT lantianming deepwholegenomesequencingof90hanchinesegenomes
AT linhaoxiang deepwholegenomesequencingof90hanchinesegenomes
AT zhuwenjuan deepwholegenomesequencingof90hanchinesegenomes
AT laurenttellierchristianaskermelchior deepwholegenomesequencingof90hanchinesegenomes
AT yangmengcheng deepwholegenomesequencingof90hanchinesegenomes
AT liuxin deepwholegenomesequencingof90hanchinesegenomes
AT wangjun deepwholegenomesequencingof90hanchinesegenomes
AT wangjian deepwholegenomesequencingof90hanchinesegenomes
AT yanghuanming deepwholegenomesequencingof90hanchinesegenomes
AT xuxun deepwholegenomesequencingof90hanchinesegenomes
AT guoxiaosen deepwholegenomesequencingof90hanchinesegenomes