Cargando…

Comprehensive Characterization of Human Genome Variation by High Coverage Whole-Genome Sequencing of Forty Four Caucasians

Whole genome sequencing studies are essential to obtain a comprehensive understanding of the vast pattern of human genomic variations. Here we report the results of a high-coverage whole genome sequencing study for 44 unrelated healthy Caucasian adults, each sequenced to over 50-fold coverage (avera...

Descripción completa

Detalles Bibliográficos
Autores principales: Shen, Hui, Li, Jian, Zhang, Jigang, Xu, Chao, Jiang, Yan, Wu, Zikai, Zhao, Fuping, Liao, Li, Chen, Jun, Lin, Yong, Tian, Qing, Papasian, Christopher J., Deng, Hong-Wen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3618277/
https://www.ncbi.nlm.nih.gov/pubmed/23577066
http://dx.doi.org/10.1371/journal.pone.0059494
_version_ 1782265392684597248
author Shen, Hui
Li, Jian
Zhang, Jigang
Xu, Chao
Jiang, Yan
Wu, Zikai
Zhao, Fuping
Liao, Li
Chen, Jun
Lin, Yong
Tian, Qing
Papasian, Christopher J.
Deng, Hong-Wen
author_facet Shen, Hui
Li, Jian
Zhang, Jigang
Xu, Chao
Jiang, Yan
Wu, Zikai
Zhao, Fuping
Liao, Li
Chen, Jun
Lin, Yong
Tian, Qing
Papasian, Christopher J.
Deng, Hong-Wen
author_sort Shen, Hui
collection PubMed
description Whole genome sequencing studies are essential to obtain a comprehensive understanding of the vast pattern of human genomic variations. Here we report the results of a high-coverage whole genome sequencing study for 44 unrelated healthy Caucasian adults, each sequenced to over 50-fold coverage (averaging 65.8×). We identified approximately 11 million single nucleotide polymorphisms (SNPs), 2.8 million short insertions and deletions, and over 500,000 block substitutions. We showed that, although previous studies, including the 1000 Genomes Project Phase 1 study, have catalogued the vast majority of common SNPs, many of the low-frequency and rare variants remain undiscovered. For instance, approximately 1.4 million SNPs and 1.3 million short indels that we found were novel to both the dbSNP and the 1000 Genomes Project Phase 1 data sets, and the majority of which (∼96%) have a minor allele frequency less than 5%. On average, each individual genome carried ∼3.3 million SNPs and ∼492,000 indels/block substitutions, including approximately 179 variants that were predicted to cause loss of function of the gene products. Moreover, each individual genome carried an average of 44 such loss-of-function variants in a homozygous state, which would completely “knock out” the corresponding genes. Across all the 44 genomes, a total of 182 genes were “knocked-out” in at least one individual genome, among which 46 genes were “knocked out” in over 30% of our samples, suggesting that a number of genes are commonly “knocked-out” in general populations. Gene ontology analysis suggested that these commonly “knocked-out” genes are enriched in biological process related to antigen processing and immune response. Our results contribute towards a comprehensive characterization of human genomic variation, especially for less-common and rare variants, and provide an invaluable resource for future genetic studies of human variation and diseases.
format Online
Article
Text
id pubmed-3618277
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-36182772013-04-10 Comprehensive Characterization of Human Genome Variation by High Coverage Whole-Genome Sequencing of Forty Four Caucasians Shen, Hui Li, Jian Zhang, Jigang Xu, Chao Jiang, Yan Wu, Zikai Zhao, Fuping Liao, Li Chen, Jun Lin, Yong Tian, Qing Papasian, Christopher J. Deng, Hong-Wen PLoS One Research Article Whole genome sequencing studies are essential to obtain a comprehensive understanding of the vast pattern of human genomic variations. Here we report the results of a high-coverage whole genome sequencing study for 44 unrelated healthy Caucasian adults, each sequenced to over 50-fold coverage (averaging 65.8×). We identified approximately 11 million single nucleotide polymorphisms (SNPs), 2.8 million short insertions and deletions, and over 500,000 block substitutions. We showed that, although previous studies, including the 1000 Genomes Project Phase 1 study, have catalogued the vast majority of common SNPs, many of the low-frequency and rare variants remain undiscovered. For instance, approximately 1.4 million SNPs and 1.3 million short indels that we found were novel to both the dbSNP and the 1000 Genomes Project Phase 1 data sets, and the majority of which (∼96%) have a minor allele frequency less than 5%. On average, each individual genome carried ∼3.3 million SNPs and ∼492,000 indels/block substitutions, including approximately 179 variants that were predicted to cause loss of function of the gene products. Moreover, each individual genome carried an average of 44 such loss-of-function variants in a homozygous state, which would completely “knock out” the corresponding genes. Across all the 44 genomes, a total of 182 genes were “knocked-out” in at least one individual genome, among which 46 genes were “knocked out” in over 30% of our samples, suggesting that a number of genes are commonly “knocked-out” in general populations. Gene ontology analysis suggested that these commonly “knocked-out” genes are enriched in biological process related to antigen processing and immune response. Our results contribute towards a comprehensive characterization of human genomic variation, especially for less-common and rare variants, and provide an invaluable resource for future genetic studies of human variation and diseases. Public Library of Science 2013-04-05 /pmc/articles/PMC3618277/ /pubmed/23577066 http://dx.doi.org/10.1371/journal.pone.0059494 Text en © 2013 Shen et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Shen, Hui
Li, Jian
Zhang, Jigang
Xu, Chao
Jiang, Yan
Wu, Zikai
Zhao, Fuping
Liao, Li
Chen, Jun
Lin, Yong
Tian, Qing
Papasian, Christopher J.
Deng, Hong-Wen
Comprehensive Characterization of Human Genome Variation by High Coverage Whole-Genome Sequencing of Forty Four Caucasians
title Comprehensive Characterization of Human Genome Variation by High Coverage Whole-Genome Sequencing of Forty Four Caucasians
title_full Comprehensive Characterization of Human Genome Variation by High Coverage Whole-Genome Sequencing of Forty Four Caucasians
title_fullStr Comprehensive Characterization of Human Genome Variation by High Coverage Whole-Genome Sequencing of Forty Four Caucasians
title_full_unstemmed Comprehensive Characterization of Human Genome Variation by High Coverage Whole-Genome Sequencing of Forty Four Caucasians
title_short Comprehensive Characterization of Human Genome Variation by High Coverage Whole-Genome Sequencing of Forty Four Caucasians
title_sort comprehensive characterization of human genome variation by high coverage whole-genome sequencing of forty four caucasians
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3618277/
https://www.ncbi.nlm.nih.gov/pubmed/23577066
http://dx.doi.org/10.1371/journal.pone.0059494
work_keys_str_mv AT shenhui comprehensivecharacterizationofhumangenomevariationbyhighcoveragewholegenomesequencingoffortyfourcaucasians
AT lijian comprehensivecharacterizationofhumangenomevariationbyhighcoveragewholegenomesequencingoffortyfourcaucasians
AT zhangjigang comprehensivecharacterizationofhumangenomevariationbyhighcoveragewholegenomesequencingoffortyfourcaucasians
AT xuchao comprehensivecharacterizationofhumangenomevariationbyhighcoveragewholegenomesequencingoffortyfourcaucasians
AT jiangyan comprehensivecharacterizationofhumangenomevariationbyhighcoveragewholegenomesequencingoffortyfourcaucasians
AT wuzikai comprehensivecharacterizationofhumangenomevariationbyhighcoveragewholegenomesequencingoffortyfourcaucasians
AT zhaofuping comprehensivecharacterizationofhumangenomevariationbyhighcoveragewholegenomesequencingoffortyfourcaucasians
AT liaoli comprehensivecharacterizationofhumangenomevariationbyhighcoveragewholegenomesequencingoffortyfourcaucasians
AT chenjun comprehensivecharacterizationofhumangenomevariationbyhighcoveragewholegenomesequencingoffortyfourcaucasians
AT linyong comprehensivecharacterizationofhumangenomevariationbyhighcoveragewholegenomesequencingoffortyfourcaucasians
AT tianqing comprehensivecharacterizationofhumangenomevariationbyhighcoveragewholegenomesequencingoffortyfourcaucasians
AT papasianchristopherj comprehensivecharacterizationofhumangenomevariationbyhighcoveragewholegenomesequencingoffortyfourcaucasians
AT denghongwen comprehensivecharacterizationofhumangenomevariationbyhighcoveragewholegenomesequencingoffortyfourcaucasians