Cargando…

Sequencing genes in silico using single nucleotide polymorphisms

BACKGROUND: The advent of high throughput sequencing technology has enabled the 1000 Genomes Project Pilot 3 to generate complete sequence data for more than 906 genes and 8,140 exons representing 697 subjects. The 1000 Genomes database provides a critical opportunity for further interpreting diseas...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Xinyi Cindy, Zhang, Bo, Li, Shuying Sue, Huang, Xin, Hansen, John A, Zhao, Lue Ping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3283449/
https://www.ncbi.nlm.nih.gov/pubmed/22289434
http://dx.doi.org/10.1186/1471-2156-13-6
_version_ 1782224186745290752
author Zhang, Xinyi Cindy
Zhang, Bo
Li, Shuying Sue
Huang, Xin
Hansen, John A
Zhao, Lue Ping
author_facet Zhang, Xinyi Cindy
Zhang, Bo
Li, Shuying Sue
Huang, Xin
Hansen, John A
Zhao, Lue Ping
author_sort Zhang, Xinyi Cindy
collection PubMed
description BACKGROUND: The advent of high throughput sequencing technology has enabled the 1000 Genomes Project Pilot 3 to generate complete sequence data for more than 906 genes and 8,140 exons representing 697 subjects. The 1000 Genomes database provides a critical opportunity for further interpreting disease associations with single nucleotide polymorphisms (SNPs) discovered from genetic association studies. Currently, direct sequencing of candidate genes or regions on a large number of subjects remains both cost- and time-prohibitive. RESULTS: To accelerate the translation from discovery to functional studies, we propose an in silico gene sequencing method (ISS), which predicts phased sequences of intragenic regions, using SNPs. The key underlying idea of our method is to infer diploid sequences (a pair of phased sequences/alleles) at every functional locus utilizing the deep sequencing data from the 1000 Genomes Project and SNP data from the HapMap Project, and to build prediction models using flanking SNPs. Using this method, we have developed a database of prediction models for 611 known genes. Sequence prediction accuracy for these genes is 96.26% on average (ranges 79%-100%). This database of prediction models can be enhanced and scaled up to include new genes as the 1000 Genomes Project sequences additional genes on additional individuals. Applying our predictive model for the KCNJ11 gene to the Wellcome Trust Case Control Consortium (WTCCC) Type 2 diabetes cohort, we demonstrate how the prediction of phased sequences inferred from GWAS SNP genotype data can be used to facilitate interpretation and identify a probable functional mechanism such as protein changes. CONCLUSIONS: Prior to the general availability of routine sequencing of all subjects, the ISS method proposed here provides a time- and cost-effective approach to broadening the characterization of disease associated SNPs and regions, and facilitating the prioritization of candidate genes for more detailed functional and mechanistic studies.
format Online
Article
Text
id pubmed-3283449
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32834492012-02-22 Sequencing genes in silico using single nucleotide polymorphisms Zhang, Xinyi Cindy Zhang, Bo Li, Shuying Sue Huang, Xin Hansen, John A Zhao, Lue Ping BMC Genet Research Article BACKGROUND: The advent of high throughput sequencing technology has enabled the 1000 Genomes Project Pilot 3 to generate complete sequence data for more than 906 genes and 8,140 exons representing 697 subjects. The 1000 Genomes database provides a critical opportunity for further interpreting disease associations with single nucleotide polymorphisms (SNPs) discovered from genetic association studies. Currently, direct sequencing of candidate genes or regions on a large number of subjects remains both cost- and time-prohibitive. RESULTS: To accelerate the translation from discovery to functional studies, we propose an in silico gene sequencing method (ISS), which predicts phased sequences of intragenic regions, using SNPs. The key underlying idea of our method is to infer diploid sequences (a pair of phased sequences/alleles) at every functional locus utilizing the deep sequencing data from the 1000 Genomes Project and SNP data from the HapMap Project, and to build prediction models using flanking SNPs. Using this method, we have developed a database of prediction models for 611 known genes. Sequence prediction accuracy for these genes is 96.26% on average (ranges 79%-100%). This database of prediction models can be enhanced and scaled up to include new genes as the 1000 Genomes Project sequences additional genes on additional individuals. Applying our predictive model for the KCNJ11 gene to the Wellcome Trust Case Control Consortium (WTCCC) Type 2 diabetes cohort, we demonstrate how the prediction of phased sequences inferred from GWAS SNP genotype data can be used to facilitate interpretation and identify a probable functional mechanism such as protein changes. CONCLUSIONS: Prior to the general availability of routine sequencing of all subjects, the ISS method proposed here provides a time- and cost-effective approach to broadening the characterization of disease associated SNPs and regions, and facilitating the prioritization of candidate genes for more detailed functional and mechanistic studies. BioMed Central 2012-01-30 /pmc/articles/PMC3283449/ /pubmed/22289434 http://dx.doi.org/10.1186/1471-2156-13-6 Text en Copyright ©2012 Zhang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zhang, Xinyi Cindy
Zhang, Bo
Li, Shuying Sue
Huang, Xin
Hansen, John A
Zhao, Lue Ping
Sequencing genes in silico using single nucleotide polymorphisms
title Sequencing genes in silico using single nucleotide polymorphisms
title_full Sequencing genes in silico using single nucleotide polymorphisms
title_fullStr Sequencing genes in silico using single nucleotide polymorphisms
title_full_unstemmed Sequencing genes in silico using single nucleotide polymorphisms
title_short Sequencing genes in silico using single nucleotide polymorphisms
title_sort sequencing genes in silico using single nucleotide polymorphisms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3283449/
https://www.ncbi.nlm.nih.gov/pubmed/22289434
http://dx.doi.org/10.1186/1471-2156-13-6
work_keys_str_mv AT zhangxinyicindy sequencinggenesinsilicousingsinglenucleotidepolymorphisms
AT zhangbo sequencinggenesinsilicousingsinglenucleotidepolymorphisms
AT lishuyingsue sequencinggenesinsilicousingsinglenucleotidepolymorphisms
AT huangxin sequencinggenesinsilicousingsinglenucleotidepolymorphisms
AT hansenjohna sequencinggenesinsilicousingsinglenucleotidepolymorphisms
AT zhaolueping sequencinggenesinsilicousingsinglenucleotidepolymorphisms