Cargando…

Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study

Identifying DNA polymorphisms that affect molecular processes like transcription, splicing, or translation typically requires genotyping and experimentally characterizing tissue from large numbers of individuals, which remains expensive and time consuming. Here we introduce an alternative strategy:...

Descripción completa

Detalles Bibliográficos
Autores principales: Levin, Tera C., Glazer, Andrew M., Pachter, Lior, Brem, Rachel B., Eisen, Michael B.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2912228/
https://www.ncbi.nlm.nih.gov/pubmed/20686598
http://dx.doi.org/10.1371/journal.pone.0011645
_version_ 1782184560674471936
author Levin, Tera C.
Glazer, Andrew M.
Pachter, Lior
Brem, Rachel B.
Eisen, Michael B.
author_facet Levin, Tera C.
Glazer, Andrew M.
Pachter, Lior
Brem, Rachel B.
Eisen, Michael B.
author_sort Levin, Tera C.
collection PubMed
description Identifying DNA polymorphisms that affect molecular processes like transcription, splicing, or translation typically requires genotyping and experimentally characterizing tissue from large numbers of individuals, which remains expensive and time consuming. Here we introduce an alternative strategy: a “synthetic association study” in which we computationally predict molecular phenotypes on artificial genomes containing randomly sampled combinations of polymorphic alleles, and perform a classical association study to identify genotypes underlying variation in these computationally predicted annotations. We applied this method to characterize the effects on gene structure of 32,792 single-nucleotide polymorphisms between two strains of the antibiotic producing fungus Penicilium chrysogenum. Although these SNPs represent only 0.1 percent of the nucleotides in the genome, they collectively altered 1.8 percent of predicted gene models between these strains. To determine which SNPs or combinations of SNPs were responsible for this variation, we predicted protein-coding genes in 500 intermediate genomes, each identical except for randomly chosen alleles at each SNP position. Of 30,468 gene models in the genome, 557 varied across these 500 genomes. 226 of these polymorphic gene models (40%) were perfectly correlated with individual SNPs, all of which were within or immediately proximal to the affected gene. The genetic architectures of the other 321 were more complex, with several examples of SNP epistasis that would have been difficult to predict a priori. We expect that many of the SNPs that affect computational gene structure reflect a biologically unrealistic sensitivity of the gene prediction algorithm to sequence changes, and we propose that genome annotation algorithms could be improved by minimizing their sensitivity to natural polymorphisms. However, many of the SNPs we identified are likely to affect transcript structure in vivo, and the synthetic association study approach can be easily generalized to any computed genome annotation to uncover relationships between genotype and important molecular phenotypes.
format Text
id pubmed-2912228
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-29122282010-08-03 Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study Levin, Tera C. Glazer, Andrew M. Pachter, Lior Brem, Rachel B. Eisen, Michael B. PLoS One Research Article Identifying DNA polymorphisms that affect molecular processes like transcription, splicing, or translation typically requires genotyping and experimentally characterizing tissue from large numbers of individuals, which remains expensive and time consuming. Here we introduce an alternative strategy: a “synthetic association study” in which we computationally predict molecular phenotypes on artificial genomes containing randomly sampled combinations of polymorphic alleles, and perform a classical association study to identify genotypes underlying variation in these computationally predicted annotations. We applied this method to characterize the effects on gene structure of 32,792 single-nucleotide polymorphisms between two strains of the antibiotic producing fungus Penicilium chrysogenum. Although these SNPs represent only 0.1 percent of the nucleotides in the genome, they collectively altered 1.8 percent of predicted gene models between these strains. To determine which SNPs or combinations of SNPs were responsible for this variation, we predicted protein-coding genes in 500 intermediate genomes, each identical except for randomly chosen alleles at each SNP position. Of 30,468 gene models in the genome, 557 varied across these 500 genomes. 226 of these polymorphic gene models (40%) were perfectly correlated with individual SNPs, all of which were within or immediately proximal to the affected gene. The genetic architectures of the other 321 were more complex, with several examples of SNP epistasis that would have been difficult to predict a priori. We expect that many of the SNPs that affect computational gene structure reflect a biologically unrealistic sensitivity of the gene prediction algorithm to sequence changes, and we propose that genome annotation algorithms could be improved by minimizing their sensitivity to natural polymorphisms. However, many of the SNPs we identified are likely to affect transcript structure in vivo, and the synthetic association study approach can be easily generalized to any computed genome annotation to uncover relationships between genotype and important molecular phenotypes. Public Library of Science 2010-07-29 /pmc/articles/PMC2912228/ /pubmed/20686598 http://dx.doi.org/10.1371/journal.pone.0011645 Text en Levin et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Levin, Tera C.
Glazer, Andrew M.
Pachter, Lior
Brem, Rachel B.
Eisen, Michael B.
Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study
title Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study
title_full Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study
title_fullStr Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study
title_full_unstemmed Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study
title_short Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study
title_sort exploring the genetic basis of variation in gene predictions with a synthetic association study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2912228/
https://www.ncbi.nlm.nih.gov/pubmed/20686598
http://dx.doi.org/10.1371/journal.pone.0011645
work_keys_str_mv AT levinterac exploringthegeneticbasisofvariationingenepredictionswithasyntheticassociationstudy
AT glazerandrewm exploringthegeneticbasisofvariationingenepredictionswithasyntheticassociationstudy
AT pachterlior exploringthegeneticbasisofvariationingenepredictionswithasyntheticassociationstudy
AT bremrachelb exploringthegeneticbasisofvariationingenepredictionswithasyntheticassociationstudy
AT eisenmichaelb exploringthegeneticbasisofvariationingenepredictionswithasyntheticassociationstudy