Cargando…
Assessing polygenic risk score models for applications in populations with under-represented genomics data: an example of Vietnam
Most polygenic risk score (PRS)models have been based on data from populations of European origins (accounting for the majority of the large genomics datasets, e.g. >78% in the UK Biobank and >85% in the GTEx project). Although several large-scale Asian biobanks were initiated (e.g. Japanese,...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9677487/ https://www.ncbi.nlm.nih.gov/pubmed/36326078 http://dx.doi.org/10.1093/bib/bbac459 |
Sumario: | Most polygenic risk score (PRS)models have been based on data from populations of European origins (accounting for the majority of the large genomics datasets, e.g. >78% in the UK Biobank and >85% in the GTEx project). Although several large-scale Asian biobanks were initiated (e.g. Japanese, Korean, Han Chinese biobanks), most other Asian countries have little or near-zero genomics data. To implement PRS models for under-represented populations, we explored transfer learning approaches, assuming that information from existing large datasets can compensate for the small sample size that can be feasibly obtained in developing countries, like Vietnam. Here, we benchmark 13 common PRS methods in meta-population strategy (combining individual genotype data from multiple populations) and multi-population strategy (combining summary statistics from multiple populations). Our results highlight the complementarity of different populations and the choice of methods should depend on the target population. Based on these results, we discussed a set of guidelines to help users select the best method for their datasets. We developed a robust and comprehensive software to allow for benchmarking comparisons between methods and proposed a computational framework for improving PRS performance in a dataset with a small sample size. This work is expected to inform the development of genomics applications in under-represented populations. PRSUP framework is available at: https://github.com/BiomedicalMachineLearning/VGP |
---|