Cargando…

Extremely low-coverage whole genome sequencing in South Asians captures population genomics information

BACKGROUND: The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at ~2x hav...

Descripción completa

Detalles Bibliográficos
Autores principales: Rustagi, Navin, Zhou, Anbo, Watkins, W. Scott, Gedvilaite, Erika, Wang, Shuoguo, Ramesh, Naveen, Muzny, Donna, Gibbs, Richard A., Jorde, Lynn B., Yu, Fuli, Xing, Jinchuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5440948/
https://www.ncbi.nlm.nih.gov/pubmed/28532386
http://dx.doi.org/10.1186/s12864-017-3767-6
_version_ 1783238164397162496
author Rustagi, Navin
Zhou, Anbo
Watkins, W. Scott
Gedvilaite, Erika
Wang, Shuoguo
Ramesh, Naveen
Muzny, Donna
Gibbs, Richard A.
Jorde, Lynn B.
Yu, Fuli
Xing, Jinchuan
author_facet Rustagi, Navin
Zhou, Anbo
Watkins, W. Scott
Gedvilaite, Erika
Wang, Shuoguo
Ramesh, Naveen
Muzny, Donna
Gibbs, Richard A.
Jorde, Lynn B.
Yu, Fuli
Xing, Jinchuan
author_sort Rustagi, Navin
collection PubMed
description BACKGROUND: The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at ~2x have shown promise for using low coverage WGS in studies focused on variant discovery, association study replications, and population genomics characterization. However, the performance of low coverage WGS in populations with a complex history and no reference panel remains to be determined. RESULTS: South Indian populations are known to have a complex population structure and are an example of a major population group that lacks adequate reference panels. To test the performance of extremely low-coverage WGS (EXL-WGS) in populations with a complex history and to provide a reference resource for South Indian populations, we performed EXL-WGS on 185 South Indian individuals from eight populations to ~1.6x coverage. Using two variant discovery pipelines, SNPTools and GATK, we generated a consensus call set that has ~90% sensitivity for identifying common variants (minor allele frequency ≥ 10%). Imputation further improves the sensitivity of our call set. In addition, we obtained high-coverage for the whole mitochondrial genome to infer the maternal lineage evolutionary history of the Indian samples. CONCLUSIONS: Overall, we demonstrate that EXL-WGS with imputation can be a valuable study design for variant discovery with a dramatically lower cost than standard WGS, even in populations with a complex history and without available reference data. In addition, the South Indian EXL-WGS data generated in this study will provide a valuable resource for future Indian genomic studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3767-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5440948
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54409482017-05-24 Extremely low-coverage whole genome sequencing in South Asians captures population genomics information Rustagi, Navin Zhou, Anbo Watkins, W. Scott Gedvilaite, Erika Wang, Shuoguo Ramesh, Naveen Muzny, Donna Gibbs, Richard A. Jorde, Lynn B. Yu, Fuli Xing, Jinchuan BMC Genomics Research Article BACKGROUND: The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at ~2x have shown promise for using low coverage WGS in studies focused on variant discovery, association study replications, and population genomics characterization. However, the performance of low coverage WGS in populations with a complex history and no reference panel remains to be determined. RESULTS: South Indian populations are known to have a complex population structure and are an example of a major population group that lacks adequate reference panels. To test the performance of extremely low-coverage WGS (EXL-WGS) in populations with a complex history and to provide a reference resource for South Indian populations, we performed EXL-WGS on 185 South Indian individuals from eight populations to ~1.6x coverage. Using two variant discovery pipelines, SNPTools and GATK, we generated a consensus call set that has ~90% sensitivity for identifying common variants (minor allele frequency ≥ 10%). Imputation further improves the sensitivity of our call set. In addition, we obtained high-coverage for the whole mitochondrial genome to infer the maternal lineage evolutionary history of the Indian samples. CONCLUSIONS: Overall, we demonstrate that EXL-WGS with imputation can be a valuable study design for variant discovery with a dramatically lower cost than standard WGS, even in populations with a complex history and without available reference data. In addition, the South Indian EXL-WGS data generated in this study will provide a valuable resource for future Indian genomic studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3767-6) contains supplementary material, which is available to authorized users. BioMed Central 2017-05-22 /pmc/articles/PMC5440948/ /pubmed/28532386 http://dx.doi.org/10.1186/s12864-017-3767-6 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Rustagi, Navin
Zhou, Anbo
Watkins, W. Scott
Gedvilaite, Erika
Wang, Shuoguo
Ramesh, Naveen
Muzny, Donna
Gibbs, Richard A.
Jorde, Lynn B.
Yu, Fuli
Xing, Jinchuan
Extremely low-coverage whole genome sequencing in South Asians captures population genomics information
title Extremely low-coverage whole genome sequencing in South Asians captures population genomics information
title_full Extremely low-coverage whole genome sequencing in South Asians captures population genomics information
title_fullStr Extremely low-coverage whole genome sequencing in South Asians captures population genomics information
title_full_unstemmed Extremely low-coverage whole genome sequencing in South Asians captures population genomics information
title_short Extremely low-coverage whole genome sequencing in South Asians captures population genomics information
title_sort extremely low-coverage whole genome sequencing in south asians captures population genomics information
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5440948/
https://www.ncbi.nlm.nih.gov/pubmed/28532386
http://dx.doi.org/10.1186/s12864-017-3767-6
work_keys_str_mv AT rustaginavin extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT zhouanbo extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT watkinswscott extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT gedvilaiteerika extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT wangshuoguo extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT rameshnaveen extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT muznydonna extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT gibbsricharda extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT jordelynnb extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT yufuli extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation
AT xingjinchuan extremelylowcoveragewholegenomesequencinginsouthasianscapturespopulationgenomicsinformation