Cargando…

Subgroup detection in genotype data using invariant coordinate selection

BACKGROUND: The current gold standard in dimension reduction methods for high-throughput genotype data is the Principle Component Analysis (PCA). The presence of PCA is so dominant, that other methods usually cannot be found in the analyst’s toolbox and hence are only rarely applied. RESULTS: We pre...

Descripción completa

Detalles Bibliográficos
Autores principales: Fischer, Daniel, Honkatukia, Mervi, Tuiskula-Haavisto, Maria, Nordhausen, Klaus, Cavero, David, Preisinger, Rudolf, Vilkki, Johanna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5356247/
https://www.ncbi.nlm.nih.gov/pubmed/28302061
http://dx.doi.org/10.1186/s12859-017-1589-9
_version_ 1782515785359425536
author Fischer, Daniel
Honkatukia, Mervi
Tuiskula-Haavisto, Maria
Nordhausen, Klaus
Cavero, David
Preisinger, Rudolf
Vilkki, Johanna
author_facet Fischer, Daniel
Honkatukia, Mervi
Tuiskula-Haavisto, Maria
Nordhausen, Klaus
Cavero, David
Preisinger, Rudolf
Vilkki, Johanna
author_sort Fischer, Daniel
collection PubMed
description BACKGROUND: The current gold standard in dimension reduction methods for high-throughput genotype data is the Principle Component Analysis (PCA). The presence of PCA is so dominant, that other methods usually cannot be found in the analyst’s toolbox and hence are only rarely applied. RESULTS: We present a modern dimension reduction method called ’Invariant Coordinate Selection’ (ICS) and its application to high-throughput genotype data. The more commonly known Independent Component Analysis (ICA) is in this framework just a special case of ICS. We use ICS on both, a simulated and a real dataset to demonstrate first some deficiencies of PCA and how ICS is capable to recover the correct subgroups within the simulated data. Second, we apply the ICS method on a chicken dataset and also detect there two subgroups. These subgroups are then further investigated with respect to their genotype to provide further evidence of the biological relevance of the detected subgroup division. Further, we compare the performance of ICS also to five other popular dimension reduction methods. CONCLUSION: The ICS method was able to detect subgroups in data where the PCA fails to detect anything. Hence, we promote the application of ICS to high-throughput genotype data in addition to the established PCA. Especially in statistical programming environments like e.g. R, its application does not add any computational burden to the analysis pipeline. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1589-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5356247
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53562472017-03-22 Subgroup detection in genotype data using invariant coordinate selection Fischer, Daniel Honkatukia, Mervi Tuiskula-Haavisto, Maria Nordhausen, Klaus Cavero, David Preisinger, Rudolf Vilkki, Johanna BMC Bioinformatics Methodology Article BACKGROUND: The current gold standard in dimension reduction methods for high-throughput genotype data is the Principle Component Analysis (PCA). The presence of PCA is so dominant, that other methods usually cannot be found in the analyst’s toolbox and hence are only rarely applied. RESULTS: We present a modern dimension reduction method called ’Invariant Coordinate Selection’ (ICS) and its application to high-throughput genotype data. The more commonly known Independent Component Analysis (ICA) is in this framework just a special case of ICS. We use ICS on both, a simulated and a real dataset to demonstrate first some deficiencies of PCA and how ICS is capable to recover the correct subgroups within the simulated data. Second, we apply the ICS method on a chicken dataset and also detect there two subgroups. These subgroups are then further investigated with respect to their genotype to provide further evidence of the biological relevance of the detected subgroup division. Further, we compare the performance of ICS also to five other popular dimension reduction methods. CONCLUSION: The ICS method was able to detect subgroups in data where the PCA fails to detect anything. Hence, we promote the application of ICS to high-throughput genotype data in addition to the established PCA. Especially in statistical programming environments like e.g. R, its application does not add any computational burden to the analysis pipeline. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1589-9) contains supplementary material, which is available to authorized users. BioMed Central 2017-03-16 /pmc/articles/PMC5356247/ /pubmed/28302061 http://dx.doi.org/10.1186/s12859-017-1589-9 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Fischer, Daniel
Honkatukia, Mervi
Tuiskula-Haavisto, Maria
Nordhausen, Klaus
Cavero, David
Preisinger, Rudolf
Vilkki, Johanna
Subgroup detection in genotype data using invariant coordinate selection
title Subgroup detection in genotype data using invariant coordinate selection
title_full Subgroup detection in genotype data using invariant coordinate selection
title_fullStr Subgroup detection in genotype data using invariant coordinate selection
title_full_unstemmed Subgroup detection in genotype data using invariant coordinate selection
title_short Subgroup detection in genotype data using invariant coordinate selection
title_sort subgroup detection in genotype data using invariant coordinate selection
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5356247/
https://www.ncbi.nlm.nih.gov/pubmed/28302061
http://dx.doi.org/10.1186/s12859-017-1589-9
work_keys_str_mv AT fischerdaniel subgroupdetectioningenotypedatausinginvariantcoordinateselection
AT honkatukiamervi subgroupdetectioningenotypedatausinginvariantcoordinateselection
AT tuiskulahaavistomaria subgroupdetectioningenotypedatausinginvariantcoordinateselection
AT nordhausenklaus subgroupdetectioningenotypedatausinginvariantcoordinateselection
AT caverodavid subgroupdetectioningenotypedatausinginvariantcoordinateselection
AT preisingerrudolf subgroupdetectioningenotypedatausinginvariantcoordinateselection
AT vilkkijohanna subgroupdetectioningenotypedatausinginvariantcoordinateselection