Cargando…

Incorporating ENCODE information into association analysis of whole genome sequencing data

With the rapidly decreasing cost of the next-generation sequencing technology, a large number of whole genome sequences have been generated, enabling researchers to survey rare variants in the protein-coding and regulatory regions of the genome. However, it remains a daunting task to identify functi...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Taebeom, Wei, Peng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133533/
https://www.ncbi.nlm.nih.gov/pubmed/27980646
http://dx.doi.org/10.1186/s12919-016-0040-y
_version_ 1782471282922618880
author Kim, Taebeom
Wei, Peng
author_facet Kim, Taebeom
Wei, Peng
author_sort Kim, Taebeom
collection PubMed
description With the rapidly decreasing cost of the next-generation sequencing technology, a large number of whole genome sequences have been generated, enabling researchers to survey rare variants in the protein-coding and regulatory regions of the genome. However, it remains a daunting task to identify functional variants associated with complex diseases from whole genome sequencing (WGS) data because of the millions of candidate variants and yet moderate sample size. We propose to incorporate the Encyclopedia of DNA Elements (ENCODE) information in the association analysis of WGS data to boost the statistical power. We use the RegulomeDB and PolyPhen2 scores as external weights in existing rare variants association tests. We demonstrate the proposed framework using the WGS data and blood pressure phenotype from the San Antonio Family Studies provided by the Genetic Analysis Workshop 19. We identified a genome-wide significant locus in gene SNUPN on chromosome 15 that harbors a rare nonsynonymous variant, which was not detected by benchmark methods that did not incorporate biological information, including the T5 burden test and sequence kernel association test.
format Online
Article
Text
id pubmed-5133533
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51335332016-12-15 Incorporating ENCODE information into association analysis of whole genome sequencing data Kim, Taebeom Wei, Peng BMC Proc Proceedings With the rapidly decreasing cost of the next-generation sequencing technology, a large number of whole genome sequences have been generated, enabling researchers to survey rare variants in the protein-coding and regulatory regions of the genome. However, it remains a daunting task to identify functional variants associated with complex diseases from whole genome sequencing (WGS) data because of the millions of candidate variants and yet moderate sample size. We propose to incorporate the Encyclopedia of DNA Elements (ENCODE) information in the association analysis of WGS data to boost the statistical power. We use the RegulomeDB and PolyPhen2 scores as external weights in existing rare variants association tests. We demonstrate the proposed framework using the WGS data and blood pressure phenotype from the San Antonio Family Studies provided by the Genetic Analysis Workshop 19. We identified a genome-wide significant locus in gene SNUPN on chromosome 15 that harbors a rare nonsynonymous variant, which was not detected by benchmark methods that did not incorporate biological information, including the T5 burden test and sequence kernel association test. BioMed Central 2016-10-18 /pmc/articles/PMC5133533/ /pubmed/27980646 http://dx.doi.org/10.1186/s12919-016-0040-y Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Kim, Taebeom
Wei, Peng
Incorporating ENCODE information into association analysis of whole genome sequencing data
title Incorporating ENCODE information into association analysis of whole genome sequencing data
title_full Incorporating ENCODE information into association analysis of whole genome sequencing data
title_fullStr Incorporating ENCODE information into association analysis of whole genome sequencing data
title_full_unstemmed Incorporating ENCODE information into association analysis of whole genome sequencing data
title_short Incorporating ENCODE information into association analysis of whole genome sequencing data
title_sort incorporating encode information into association analysis of whole genome sequencing data
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133533/
https://www.ncbi.nlm.nih.gov/pubmed/27980646
http://dx.doi.org/10.1186/s12919-016-0040-y
work_keys_str_mv AT kimtaebeom incorporatingencodeinformationintoassociationanalysisofwholegenomesequencingdata
AT weipeng incorporatingencodeinformationintoassociationanalysisofwholegenomesequencingdata