Cargando…

Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations

Using genetic data to predict gene expression has garnered significant attention in recent years. PrediXcan has become one of the most widely used gene-based methods for testing associations between predicted gene expression values and a phenotype, which has facilitated novel insights into the relat...

Descripción completa

Detalles Bibliográficos
Autores principales: Mikhaylova, Anna V., Thornton, Timothy A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6456650/
https://www.ncbi.nlm.nih.gov/pubmed/31001318
http://dx.doi.org/10.3389/fgene.2019.00261
_version_ 1783409779997147136
author Mikhaylova, Anna V.
Thornton, Timothy A.
author_facet Mikhaylova, Anna V.
Thornton, Timothy A.
author_sort Mikhaylova, Anna V.
collection PubMed
description Using genetic data to predict gene expression has garnered significant attention in recent years. PrediXcan has become one of the most widely used gene-based methods for testing associations between predicted gene expression values and a phenotype, which has facilitated novel insights into the relationship between complex traits and the component of gene expression that can be attributed to genetic variation. The gene expression prediction models for PrediXcan were developed using supervised machine learning methods and training data from the Depression Genes and Networks (DGN) study and the Genotype-Tissue Expression (GTEx) project, where the majority of subjects are of European descent. Many genetic studies, however, include samples from multi-ethnic populations, and in this paper we evaluate the accuracy of PrediXcan for predicting gene expression in diverse populations. Using transcriptomic data from the GEUVADIS (Genetic European Variation in Disease) RNA sequencing project and whole genome sequencing data from the 1000 Genomes project, we evaluate and compare the predictive performance of PrediXcan in an African population (Yoruban) and four European ancestry populations for thousands of genes. We evaluate a range of models from the PrediXcan weight databases and use Pearson's correlation coefficient to assess gene expression prediction accuracy with PrediXcan. From our evaluation, we find that the predictive performance of PrediXcan varies substantially among populations from different continents (F-test p-value < 2.2 × 10(−16)), where prediction accuracy is lower in the Yoruban population from West Africa compared to the European-ancestry populations. Moreover, not only do we find differences in predictive performance between populations from different continents, we also find highly significant differences in prediction accuracy among the four European ancestry populations considered (F-test p-value < 2.2 × 10(−16)). Finally, while there is variability in prediction accuracy across different PrediXcan weight databases, we also find consistency in the qualitative performance of PrediXcan for the five populations considered, with the African ancestry population having the lowest accuracy across databases.
format Online
Article
Text
id pubmed-6456650
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-64566502019-04-18 Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations Mikhaylova, Anna V. Thornton, Timothy A. Front Genet Genetics Using genetic data to predict gene expression has garnered significant attention in recent years. PrediXcan has become one of the most widely used gene-based methods for testing associations between predicted gene expression values and a phenotype, which has facilitated novel insights into the relationship between complex traits and the component of gene expression that can be attributed to genetic variation. The gene expression prediction models for PrediXcan were developed using supervised machine learning methods and training data from the Depression Genes and Networks (DGN) study and the Genotype-Tissue Expression (GTEx) project, where the majority of subjects are of European descent. Many genetic studies, however, include samples from multi-ethnic populations, and in this paper we evaluate the accuracy of PrediXcan for predicting gene expression in diverse populations. Using transcriptomic data from the GEUVADIS (Genetic European Variation in Disease) RNA sequencing project and whole genome sequencing data from the 1000 Genomes project, we evaluate and compare the predictive performance of PrediXcan in an African population (Yoruban) and four European ancestry populations for thousands of genes. We evaluate a range of models from the PrediXcan weight databases and use Pearson's correlation coefficient to assess gene expression prediction accuracy with PrediXcan. From our evaluation, we find that the predictive performance of PrediXcan varies substantially among populations from different continents (F-test p-value < 2.2 × 10(−16)), where prediction accuracy is lower in the Yoruban population from West Africa compared to the European-ancestry populations. Moreover, not only do we find differences in predictive performance between populations from different continents, we also find highly significant differences in prediction accuracy among the four European ancestry populations considered (F-test p-value < 2.2 × 10(−16)). Finally, while there is variability in prediction accuracy across different PrediXcan weight databases, we also find consistency in the qualitative performance of PrediXcan for the five populations considered, with the African ancestry population having the lowest accuracy across databases. Frontiers Media S.A. 2019-04-03 /pmc/articles/PMC6456650/ /pubmed/31001318 http://dx.doi.org/10.3389/fgene.2019.00261 Text en Copyright © 2019 Mikhaylova and Thornton. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Mikhaylova, Anna V.
Thornton, Timothy A.
Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations
title Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations
title_full Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations
title_fullStr Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations
title_full_unstemmed Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations
title_short Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations
title_sort accuracy of gene expression prediction from genotype data with predixcan varies across and within continental populations
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6456650/
https://www.ncbi.nlm.nih.gov/pubmed/31001318
http://dx.doi.org/10.3389/fgene.2019.00261
work_keys_str_mv AT mikhaylovaannav accuracyofgeneexpressionpredictionfromgenotypedatawithpredixcanvariesacrossandwithincontinentalpopulations
AT thorntontimothya accuracyofgeneexpressionpredictionfromgenotypedatawithpredixcanvariesacrossandwithincontinentalpopulations