Cargando…
Incorporating ENCODE information into association analysis of whole genome sequencing data
With the rapidly decreasing cost of the next-generation sequencing technology, a large number of whole genome sequences have been generated, enabling researchers to survey rare variants in the protein-coding and regulatory regions of the genome. However, it remains a daunting task to identify functi...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133533/ https://www.ncbi.nlm.nih.gov/pubmed/27980646 http://dx.doi.org/10.1186/s12919-016-0040-y |
_version_ | 1782471282922618880 |
---|---|
author | Kim, Taebeom Wei, Peng |
author_facet | Kim, Taebeom Wei, Peng |
author_sort | Kim, Taebeom |
collection | PubMed |
description | With the rapidly decreasing cost of the next-generation sequencing technology, a large number of whole genome sequences have been generated, enabling researchers to survey rare variants in the protein-coding and regulatory regions of the genome. However, it remains a daunting task to identify functional variants associated with complex diseases from whole genome sequencing (WGS) data because of the millions of candidate variants and yet moderate sample size. We propose to incorporate the Encyclopedia of DNA Elements (ENCODE) information in the association analysis of WGS data to boost the statistical power. We use the RegulomeDB and PolyPhen2 scores as external weights in existing rare variants association tests. We demonstrate the proposed framework using the WGS data and blood pressure phenotype from the San Antonio Family Studies provided by the Genetic Analysis Workshop 19. We identified a genome-wide significant locus in gene SNUPN on chromosome 15 that harbors a rare nonsynonymous variant, which was not detected by benchmark methods that did not incorporate biological information, including the T5 burden test and sequence kernel association test. |
format | Online Article Text |
id | pubmed-5133533 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-51335332016-12-15 Incorporating ENCODE information into association analysis of whole genome sequencing data Kim, Taebeom Wei, Peng BMC Proc Proceedings With the rapidly decreasing cost of the next-generation sequencing technology, a large number of whole genome sequences have been generated, enabling researchers to survey rare variants in the protein-coding and regulatory regions of the genome. However, it remains a daunting task to identify functional variants associated with complex diseases from whole genome sequencing (WGS) data because of the millions of candidate variants and yet moderate sample size. We propose to incorporate the Encyclopedia of DNA Elements (ENCODE) information in the association analysis of WGS data to boost the statistical power. We use the RegulomeDB and PolyPhen2 scores as external weights in existing rare variants association tests. We demonstrate the proposed framework using the WGS data and blood pressure phenotype from the San Antonio Family Studies provided by the Genetic Analysis Workshop 19. We identified a genome-wide significant locus in gene SNUPN on chromosome 15 that harbors a rare nonsynonymous variant, which was not detected by benchmark methods that did not incorporate biological information, including the T5 burden test and sequence kernel association test. BioMed Central 2016-10-18 /pmc/articles/PMC5133533/ /pubmed/27980646 http://dx.doi.org/10.1186/s12919-016-0040-y Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Proceedings Kim, Taebeom Wei, Peng Incorporating ENCODE information into association analysis of whole genome sequencing data |
title | Incorporating ENCODE information into association analysis of whole genome sequencing data |
title_full | Incorporating ENCODE information into association analysis of whole genome sequencing data |
title_fullStr | Incorporating ENCODE information into association analysis of whole genome sequencing data |
title_full_unstemmed | Incorporating ENCODE information into association analysis of whole genome sequencing data |
title_short | Incorporating ENCODE information into association analysis of whole genome sequencing data |
title_sort | incorporating encode information into association analysis of whole genome sequencing data |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133533/ https://www.ncbi.nlm.nih.gov/pubmed/27980646 http://dx.doi.org/10.1186/s12919-016-0040-y |
work_keys_str_mv | AT kimtaebeom incorporatingencodeinformationintoassociationanalysisofwholegenomesequencingdata AT weipeng incorporatingencodeinformationintoassociationanalysisofwholegenomesequencingdata |