Cargando…
A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization
Population structure is a confounding factor in genome-wide association studies, increasing the rate of false positive associations. To correct for it, several model-based algorithms such as ADMIXTURE and STRUCTURE have been proposed. These tend to suffer from the fact that they have a considerable...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Genetics Society of America
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4256762/ https://www.ncbi.nlm.nih.gov/pubmed/25326237 http://dx.doi.org/10.1534/genetics.114.171314 |
_version_ | 1782347630822555648 |
---|---|
author | Popescu, Andrei-Alin Harper, Andrea L. Trick, Martin Bancroft, Ian Huber, Katharina T. |
author_facet | Popescu, Andrei-Alin Harper, Andrea L. Trick, Martin Bancroft, Ian Huber, Katharina T. |
author_sort | Popescu, Andrei-Alin |
collection | PubMed |
description | Population structure is a confounding factor in genome-wide association studies, increasing the rate of false positive associations. To correct for it, several model-based algorithms such as ADMIXTURE and STRUCTURE have been proposed. These tend to suffer from the fact that they have a considerable computational burden, limiting their applicability when used with large datasets, such as those produced by next generation sequencing techniques. To address this, nonmodel based approaches such as sparse nonnegative matrix factorization (sNMF) and EIGENSTRAT have been proposed, which scale better with larger data. Here we present a novel nonmodel-based approach, population structure inference using kernel-PCA and optimization (PSIKO), which is based on a unique combination of linear kernel-PCA and least-squares optimization and allows for the inference of admixture coefficients, principal components, and number of founder populations of a dataset. PSIKO has been compared against existing leading methods on a variety of simulation scenarios, as well as on real biological data. We found that in addition to producing results of the same quality as other tested methods, PSIKO scales extremely well with dataset size, being considerably (up to 30 times) faster for longer sequences than even state-of-the-art methods such as sNMF. PSIKO and accompanying manual are freely available at https://www.uea.ac.uk/computing/psiko. |
format | Online Article Text |
id | pubmed-4256762 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Genetics Society of America |
record_format | MEDLINE/PubMed |
spelling | pubmed-42567622014-12-08 A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization Popescu, Andrei-Alin Harper, Andrea L. Trick, Martin Bancroft, Ian Huber, Katharina T. Genetics Investigations Population structure is a confounding factor in genome-wide association studies, increasing the rate of false positive associations. To correct for it, several model-based algorithms such as ADMIXTURE and STRUCTURE have been proposed. These tend to suffer from the fact that they have a considerable computational burden, limiting their applicability when used with large datasets, such as those produced by next generation sequencing techniques. To address this, nonmodel based approaches such as sparse nonnegative matrix factorization (sNMF) and EIGENSTRAT have been proposed, which scale better with larger data. Here we present a novel nonmodel-based approach, population structure inference using kernel-PCA and optimization (PSIKO), which is based on a unique combination of linear kernel-PCA and least-squares optimization and allows for the inference of admixture coefficients, principal components, and number of founder populations of a dataset. PSIKO has been compared against existing leading methods on a variety of simulation scenarios, as well as on real biological data. We found that in addition to producing results of the same quality as other tested methods, PSIKO scales extremely well with dataset size, being considerably (up to 30 times) faster for longer sequences than even state-of-the-art methods such as sNMF. PSIKO and accompanying manual are freely available at https://www.uea.ac.uk/computing/psiko. Genetics Society of America 2014-12 2014-10-16 /pmc/articles/PMC4256762/ /pubmed/25326237 http://dx.doi.org/10.1534/genetics.114.171314 Text en Copyright © 2014 by the Genetics Society of America Available freely online through the author-supported open access option. |
spellingShingle | Investigations Popescu, Andrei-Alin Harper, Andrea L. Trick, Martin Bancroft, Ian Huber, Katharina T. A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization |
title | A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization |
title_full | A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization |
title_fullStr | A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization |
title_full_unstemmed | A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization |
title_short | A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization |
title_sort | novel and fast approach for population structure inference using kernel-pca and optimization |
topic | Investigations |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4256762/ https://www.ncbi.nlm.nih.gov/pubmed/25326237 http://dx.doi.org/10.1534/genetics.114.171314 |
work_keys_str_mv | AT popescuandreialin anovelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization AT harperandreal anovelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization AT trickmartin anovelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization AT bancroftian anovelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization AT huberkatharinat anovelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization AT popescuandreialin novelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization AT harperandreal novelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization AT trickmartin novelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization AT bancroftian novelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization AT huberkatharinat novelandfastapproachforpopulationstructureinferenceusingkernelpcaandoptimization |