Cargando…
Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study
Identifying DNA polymorphisms that affect molecular processes like transcription, splicing, or translation typically requires genotyping and experimentally characterizing tissue from large numbers of individuals, which remains expensive and time consuming. Here we introduce an alternative strategy:...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2912228/ https://www.ncbi.nlm.nih.gov/pubmed/20686598 http://dx.doi.org/10.1371/journal.pone.0011645 |
_version_ | 1782184560674471936 |
---|---|
author | Levin, Tera C. Glazer, Andrew M. Pachter, Lior Brem, Rachel B. Eisen, Michael B. |
author_facet | Levin, Tera C. Glazer, Andrew M. Pachter, Lior Brem, Rachel B. Eisen, Michael B. |
author_sort | Levin, Tera C. |
collection | PubMed |
description | Identifying DNA polymorphisms that affect molecular processes like transcription, splicing, or translation typically requires genotyping and experimentally characterizing tissue from large numbers of individuals, which remains expensive and time consuming. Here we introduce an alternative strategy: a “synthetic association study” in which we computationally predict molecular phenotypes on artificial genomes containing randomly sampled combinations of polymorphic alleles, and perform a classical association study to identify genotypes underlying variation in these computationally predicted annotations. We applied this method to characterize the effects on gene structure of 32,792 single-nucleotide polymorphisms between two strains of the antibiotic producing fungus Penicilium chrysogenum. Although these SNPs represent only 0.1 percent of the nucleotides in the genome, they collectively altered 1.8 percent of predicted gene models between these strains. To determine which SNPs or combinations of SNPs were responsible for this variation, we predicted protein-coding genes in 500 intermediate genomes, each identical except for randomly chosen alleles at each SNP position. Of 30,468 gene models in the genome, 557 varied across these 500 genomes. 226 of these polymorphic gene models (40%) were perfectly correlated with individual SNPs, all of which were within or immediately proximal to the affected gene. The genetic architectures of the other 321 were more complex, with several examples of SNP epistasis that would have been difficult to predict a priori. We expect that many of the SNPs that affect computational gene structure reflect a biologically unrealistic sensitivity of the gene prediction algorithm to sequence changes, and we propose that genome annotation algorithms could be improved by minimizing their sensitivity to natural polymorphisms. However, many of the SNPs we identified are likely to affect transcript structure in vivo, and the synthetic association study approach can be easily generalized to any computed genome annotation to uncover relationships between genotype and important molecular phenotypes. |
format | Text |
id | pubmed-2912228 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-29122282010-08-03 Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study Levin, Tera C. Glazer, Andrew M. Pachter, Lior Brem, Rachel B. Eisen, Michael B. PLoS One Research Article Identifying DNA polymorphisms that affect molecular processes like transcription, splicing, or translation typically requires genotyping and experimentally characterizing tissue from large numbers of individuals, which remains expensive and time consuming. Here we introduce an alternative strategy: a “synthetic association study” in which we computationally predict molecular phenotypes on artificial genomes containing randomly sampled combinations of polymorphic alleles, and perform a classical association study to identify genotypes underlying variation in these computationally predicted annotations. We applied this method to characterize the effects on gene structure of 32,792 single-nucleotide polymorphisms between two strains of the antibiotic producing fungus Penicilium chrysogenum. Although these SNPs represent only 0.1 percent of the nucleotides in the genome, they collectively altered 1.8 percent of predicted gene models between these strains. To determine which SNPs or combinations of SNPs were responsible for this variation, we predicted protein-coding genes in 500 intermediate genomes, each identical except for randomly chosen alleles at each SNP position. Of 30,468 gene models in the genome, 557 varied across these 500 genomes. 226 of these polymorphic gene models (40%) were perfectly correlated with individual SNPs, all of which were within or immediately proximal to the affected gene. The genetic architectures of the other 321 were more complex, with several examples of SNP epistasis that would have been difficult to predict a priori. We expect that many of the SNPs that affect computational gene structure reflect a biologically unrealistic sensitivity of the gene prediction algorithm to sequence changes, and we propose that genome annotation algorithms could be improved by minimizing their sensitivity to natural polymorphisms. However, many of the SNPs we identified are likely to affect transcript structure in vivo, and the synthetic association study approach can be easily generalized to any computed genome annotation to uncover relationships between genotype and important molecular phenotypes. Public Library of Science 2010-07-29 /pmc/articles/PMC2912228/ /pubmed/20686598 http://dx.doi.org/10.1371/journal.pone.0011645 Text en Levin et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Levin, Tera C. Glazer, Andrew M. Pachter, Lior Brem, Rachel B. Eisen, Michael B. Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study |
title | Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study |
title_full | Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study |
title_fullStr | Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study |
title_full_unstemmed | Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study |
title_short | Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study |
title_sort | exploring the genetic basis of variation in gene predictions with a synthetic association study |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2912228/ https://www.ncbi.nlm.nih.gov/pubmed/20686598 http://dx.doi.org/10.1371/journal.pone.0011645 |
work_keys_str_mv | AT levinterac exploringthegeneticbasisofvariationingenepredictionswithasyntheticassociationstudy AT glazerandrewm exploringthegeneticbasisofvariationingenepredictionswithasyntheticassociationstudy AT pachterlior exploringthegeneticbasisofvariationingenepredictionswithasyntheticassociationstudy AT bremrachelb exploringthegeneticbasisofvariationingenepredictionswithasyntheticassociationstudy AT eisenmichaelb exploringthegeneticbasisofvariationingenepredictionswithasyntheticassociationstudy |