Cargando…

Investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome‐wide association studies

In transcriptome‐wide association studies (TWAS), gene expression values are predicted using genotype data and tested for association with a phenotype. The power of this approach to detect associations relies, at least in part, on the accuracy of the prediction. Here we compare the prediction accura...

Descripción completa

Detalles Bibliográficos
Autores principales: Fryett, James J., Morris, Andrew P., Cordell, Heather J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8641384/
https://www.ncbi.nlm.nih.gov/pubmed/32190932
http://dx.doi.org/10.1002/gepi.22290
_version_ 1784609485376978944
author Fryett, James J.
Morris, Andrew P.
Cordell, Heather J.
author_facet Fryett, James J.
Morris, Andrew P.
Cordell, Heather J.
author_sort Fryett, James J.
collection PubMed
description In transcriptome‐wide association studies (TWAS), gene expression values are predicted using genotype data and tested for association with a phenotype. The power of this approach to detect associations relies, at least in part, on the accuracy of the prediction. Here we compare the prediction accuracy of six different methods—LASSO, Ridge regression, Elastic net, Best Linear Unbiased Predictor, Bayesian Sparse Linear Mixed Model, and Random Forests—by performing cross‐validation using data from the Geuvadis Project. We also examine prediction accuracy (a) at different sample sizes, (b) when ancestry of the prediction model training and testing populations is different, and (c) when the tissue used to train the model is different from the tissue to be predicted. We find that, for most genes, the expression cannot be accurately predicted, but in general sparse statistical models tend to outperform polygenic models at prediction. Average prediction accuracy is reduced when the model training set size is reduced or when predicting across ancestries and is marginally reduced when predicting across tissues. We conclude that using sparse statistical models and the development of large reference panels across multiple ethnicities and tissues will lead to better prediction of gene expression, and thus may improve TWAS power.
format Online
Article
Text
id pubmed-8641384
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-86413842021-12-09 Investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome‐wide association studies Fryett, James J. Morris, Andrew P. Cordell, Heather J. Genet Epidemiol Research Articles In transcriptome‐wide association studies (TWAS), gene expression values are predicted using genotype data and tested for association with a phenotype. The power of this approach to detect associations relies, at least in part, on the accuracy of the prediction. Here we compare the prediction accuracy of six different methods—LASSO, Ridge regression, Elastic net, Best Linear Unbiased Predictor, Bayesian Sparse Linear Mixed Model, and Random Forests—by performing cross‐validation using data from the Geuvadis Project. We also examine prediction accuracy (a) at different sample sizes, (b) when ancestry of the prediction model training and testing populations is different, and (c) when the tissue used to train the model is different from the tissue to be predicted. We find that, for most genes, the expression cannot be accurately predicted, but in general sparse statistical models tend to outperform polygenic models at prediction. Average prediction accuracy is reduced when the model training set size is reduced or when predicting across ancestries and is marginally reduced when predicting across tissues. We conclude that using sparse statistical models and the development of large reference panels across multiple ethnicities and tissues will lead to better prediction of gene expression, and thus may improve TWAS power. John Wiley and Sons Inc. 2020-03-19 2020-07 /pmc/articles/PMC8641384/ /pubmed/32190932 http://dx.doi.org/10.1002/gepi.22290 Text en © 2020 The Authors. Genetic Epidemiology published by Wiley Periodicals LLC https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Fryett, James J.
Morris, Andrew P.
Cordell, Heather J.
Investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome‐wide association studies
title Investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome‐wide association studies
title_full Investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome‐wide association studies
title_fullStr Investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome‐wide association studies
title_full_unstemmed Investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome‐wide association studies
title_short Investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome‐wide association studies
title_sort investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome‐wide association studies
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8641384/
https://www.ncbi.nlm.nih.gov/pubmed/32190932
http://dx.doi.org/10.1002/gepi.22290
work_keys_str_mv AT fryettjamesj investigationofpredictionaccuracyandtheimpactofsamplesizeancestryandtissueintranscriptomewideassociationstudies
AT morrisandrewp investigationofpredictionaccuracyandtheimpactofsamplesizeancestryandtissueintranscriptomewideassociationstudies
AT cordellheatherj investigationofpredictionaccuracyandtheimpactofsamplesizeancestryandtissueintranscriptomewideassociationstudies