Cargando…

A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization

Population structure is a confounding factor in genome-wide association studies, increasing the rate of false positive associations. To correct for it, several model-based algorithms such as ADMIXTURE and STRUCTURE have been proposed. These tend to suffer from the fact that they have a considerable...

Descripción completa

Detalles Bibliográficos
Autores principales: Popescu, Andrei-Alin, Harper, Andrea L., Trick, Martin, Bancroft, Ian, Huber, Katharina T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4256762/
https://www.ncbi.nlm.nih.gov/pubmed/25326237
http://dx.doi.org/10.1534/genetics.114.171314
_version_ 1782347630822555648
author Popescu, Andrei-Alin
Harper, Andrea L.
Trick, Martin
Bancroft, Ian
Huber, Katharina T.
author_facet Popescu, Andrei-Alin
Harper, Andrea L.
Trick, Martin
Bancroft, Ian
Huber, Katharina T.
author_sort Popescu, Andrei-Alin
collection PubMed
description Population structure is a confounding factor in genome-wide association studies, increasing the rate of false positive associations. To correct for it, several model-based algorithms such as ADMIXTURE and STRUCTURE have been proposed. These tend to suffer from the fact that they have a considerable computational burden, limiting their applicability when used with large datasets, such as those produced by next generation sequencing techniques. To address this, nonmodel based approaches such as sparse nonnegative matrix factorization (sNMF) and EIGENSTRAT have been proposed, which scale better with larger data. Here we present a novel nonmodel-based approach, population structure inference using kernel-PCA and optimization (PSIKO), which is based on a unique combination of linear kernel-PCA and least-squares optimization and allows for the inference of admixture coefficients, principal components, and number of founder populations of a dataset. PSIKO has been compared against existing leading methods on a variety of simulation scenarios, as well as on real biological data. We found that in addition to producing results of the same quality as other tested methods, PSIKO scales extremely well with dataset size, being considerably (up to 30 times) faster for longer sequences than even state-of-the-art methods such as sNMF. PSIKO and accompanying manual are freely available at https://www.uea.ac.uk/computing/psiko.
format Online
Article
Text
id pubmed-4256762
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-42567622014-12-08 A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization Popescu, Andrei-Alin Harper, Andrea L. Trick, Martin Bancroft, Ian Huber, Katharina T. Genetics Investigations Population structure is a confounding factor in genome-wide association studies, increasing the rate of false positive associations. To correct for it, several model-based algorithms such as ADMIXTURE and STRUCTURE have been proposed. These tend to suffer from the fact that they have a considerable computational burden, limiting their applicability when used with large datasets, such as those produced by next generation sequencing techniques. To address this, nonmodel based approaches such as sparse nonnegative matrix factorization (sNMF) and EIGENSTRAT have been proposed, which scale better with larger data. Here we present a novel nonmodel-based approach, population structure inference using kernel-PCA and optimization (PSIKO), which is based on a unique combination of linear kernel-PCA and least-squares optimization and allows for the inference of admixture coefficients, principal components, and number of founder populations of a dataset. PSIKO has been compared against existing leading methods on a variety of simulation scenarios, as well as on real biological data. We found that in addition to producing results of the same quality as other tested methods, PSIKO scales extremely well with dataset size, being considerably (up to 30 times) faster for longer sequences than even state-of-the-art methods such as sNMF. PSIKO and accompanying manual are freely available at https://www.uea.ac.uk/computing/psiko. Genetics Society of America 2014-12 2014-10-16 /pmc/articles/PMC4256762/ /pubmed/25326237 http://dx.doi.org/10.1534/genetics.114.171314 Text en Copyright © 2014 by the Genetics Society of America Available freely online through the author-supported open access option.
spellingShingle Investigations
Popescu, Andrei-Alin
Harper, Andrea L.
Trick, Martin
Bancroft, Ian
Huber, Katharina T.
A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization
title A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization
title_full A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization
title_fullStr A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization
title_full_unstemmed A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization
title_short A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization
title_sort novel and fast approach for population structure inference using kernel-pca and optimization
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4256762/
https://www.ncbi.nlm.nih.gov/pubmed/25326237
http://dx.doi.org/10.1534/genetics.114.171314
work_keys_str_mv AT popescuandreialin anovelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization
AT harperandreal anovelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization
AT trickmartin anovelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization
AT bancroftian anovelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization
AT huberkatharinat anovelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization
AT popescuandreialin novelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization
AT harperandreal novelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization
AT trickmartin novelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization
AT bancroftian novelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization
AT huberkatharinat novelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization