Cargando…

Highlighting nonlinear patterns in population genetics datasets

Detecting structure in population genetics and case-control studies is important, as it exposes phenomena such as ecoclines, admixture and stratification. Principal Component Analysis (PCA) is a linear dimension-reduction technique commonly used for this purpose, but it struggles to reveal complex,...

Descripción completa

Detalles Bibliográficos
Autores principales: Alanis-Lobato, Gregorio, Cannistraci, Carlo Vittorio, Eriksson, Anders, Manica, Andrea, Ravasi, Timothy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4311249/
https://www.ncbi.nlm.nih.gov/pubmed/25633916
http://dx.doi.org/10.1038/srep08140
_version_ 1782354958098628608
author Alanis-Lobato, Gregorio
Cannistraci, Carlo Vittorio
Eriksson, Anders
Manica, Andrea
Ravasi, Timothy
author_facet Alanis-Lobato, Gregorio
Cannistraci, Carlo Vittorio
Eriksson, Anders
Manica, Andrea
Ravasi, Timothy
author_sort Alanis-Lobato, Gregorio
collection PubMed
description Detecting structure in population genetics and case-control studies is important, as it exposes phenomena such as ecoclines, admixture and stratification. Principal Component Analysis (PCA) is a linear dimension-reduction technique commonly used for this purpose, but it struggles to reveal complex, nonlinear data patterns. In this paper we introduce non-centred Minimum Curvilinear Embedding (ncMCE), a nonlinear method to overcome this problem. Our analyses show that ncMCE can separate individuals into ethnic groups in cases in which PCA fails to reveal any clear structure. This increased discrimination power arises from ncMCE's ability to better capture the phylogenetic signal in the samples, whereas PCA better reflects their geographic relation. We also demonstrate how ncMCE can discover interesting patterns, even when the data has been poorly pre-processed. The juxtaposition of PCA and ncMCE visualisations provides a new standard of analysis with utility for discovering and validating significant linear/nonlinear complementary patterns in genetic data.
format Online
Article
Text
id pubmed-4311249
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-43112492015-02-09 Highlighting nonlinear patterns in population genetics datasets Alanis-Lobato, Gregorio Cannistraci, Carlo Vittorio Eriksson, Anders Manica, Andrea Ravasi, Timothy Sci Rep Article Detecting structure in population genetics and case-control studies is important, as it exposes phenomena such as ecoclines, admixture and stratification. Principal Component Analysis (PCA) is a linear dimension-reduction technique commonly used for this purpose, but it struggles to reveal complex, nonlinear data patterns. In this paper we introduce non-centred Minimum Curvilinear Embedding (ncMCE), a nonlinear method to overcome this problem. Our analyses show that ncMCE can separate individuals into ethnic groups in cases in which PCA fails to reveal any clear structure. This increased discrimination power arises from ncMCE's ability to better capture the phylogenetic signal in the samples, whereas PCA better reflects their geographic relation. We also demonstrate how ncMCE can discover interesting patterns, even when the data has been poorly pre-processed. The juxtaposition of PCA and ncMCE visualisations provides a new standard of analysis with utility for discovering and validating significant linear/nonlinear complementary patterns in genetic data. Nature Publishing Group 2015-01-30 /pmc/articles/PMC4311249/ /pubmed/25633916 http://dx.doi.org/10.1038/srep08140 Text en Copyright © 2015, Macmillan Publishers Limited. All rights reserved http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Alanis-Lobato, Gregorio
Cannistraci, Carlo Vittorio
Eriksson, Anders
Manica, Andrea
Ravasi, Timothy
Highlighting nonlinear patterns in population genetics datasets
title Highlighting nonlinear patterns in population genetics datasets
title_full Highlighting nonlinear patterns in population genetics datasets
title_fullStr Highlighting nonlinear patterns in population genetics datasets
title_full_unstemmed Highlighting nonlinear patterns in population genetics datasets
title_short Highlighting nonlinear patterns in population genetics datasets
title_sort highlighting nonlinear patterns in population genetics datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4311249/
https://www.ncbi.nlm.nih.gov/pubmed/25633916
http://dx.doi.org/10.1038/srep08140
work_keys_str_mv AT alanislobatogregorio highlightingnonlinearpatternsinpopulationgeneticsdatasets
AT cannistracicarlovittorio highlightingnonlinearpatternsinpopulationgeneticsdatasets
AT erikssonanders highlightingnonlinearpatternsinpopulationgeneticsdatasets
AT manicaandrea highlightingnonlinearpatternsinpopulationgeneticsdatasets
AT ravasitimothy highlightingnonlinearpatternsinpopulationgeneticsdatasets