Cargando…
Subgroup detection in genotype data using invariant coordinate selection
BACKGROUND: The current gold standard in dimension reduction methods for high-throughput genotype data is the Principle Component Analysis (PCA). The presence of PCA is so dominant, that other methods usually cannot be found in the analyst’s toolbox and hence are only rarely applied. RESULTS: We pre...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5356247/ https://www.ncbi.nlm.nih.gov/pubmed/28302061 http://dx.doi.org/10.1186/s12859-017-1589-9 |
_version_ | 1782515785359425536 |
---|---|
author | Fischer, Daniel Honkatukia, Mervi Tuiskula-Haavisto, Maria Nordhausen, Klaus Cavero, David Preisinger, Rudolf Vilkki, Johanna |
author_facet | Fischer, Daniel Honkatukia, Mervi Tuiskula-Haavisto, Maria Nordhausen, Klaus Cavero, David Preisinger, Rudolf Vilkki, Johanna |
author_sort | Fischer, Daniel |
collection | PubMed |
description | BACKGROUND: The current gold standard in dimension reduction methods for high-throughput genotype data is the Principle Component Analysis (PCA). The presence of PCA is so dominant, that other methods usually cannot be found in the analyst’s toolbox and hence are only rarely applied. RESULTS: We present a modern dimension reduction method called ’Invariant Coordinate Selection’ (ICS) and its application to high-throughput genotype data. The more commonly known Independent Component Analysis (ICA) is in this framework just a special case of ICS. We use ICS on both, a simulated and a real dataset to demonstrate first some deficiencies of PCA and how ICS is capable to recover the correct subgroups within the simulated data. Second, we apply the ICS method on a chicken dataset and also detect there two subgroups. These subgroups are then further investigated with respect to their genotype to provide further evidence of the biological relevance of the detected subgroup division. Further, we compare the performance of ICS also to five other popular dimension reduction methods. CONCLUSION: The ICS method was able to detect subgroups in data where the PCA fails to detect anything. Hence, we promote the application of ICS to high-throughput genotype data in addition to the established PCA. Especially in statistical programming environments like e.g. R, its application does not add any computational burden to the analysis pipeline. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1589-9) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5356247 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-53562472017-03-22 Subgroup detection in genotype data using invariant coordinate selection Fischer, Daniel Honkatukia, Mervi Tuiskula-Haavisto, Maria Nordhausen, Klaus Cavero, David Preisinger, Rudolf Vilkki, Johanna BMC Bioinformatics Methodology Article BACKGROUND: The current gold standard in dimension reduction methods for high-throughput genotype data is the Principle Component Analysis (PCA). The presence of PCA is so dominant, that other methods usually cannot be found in the analyst’s toolbox and hence are only rarely applied. RESULTS: We present a modern dimension reduction method called ’Invariant Coordinate Selection’ (ICS) and its application to high-throughput genotype data. The more commonly known Independent Component Analysis (ICA) is in this framework just a special case of ICS. We use ICS on both, a simulated and a real dataset to demonstrate first some deficiencies of PCA and how ICS is capable to recover the correct subgroups within the simulated data. Second, we apply the ICS method on a chicken dataset and also detect there two subgroups. These subgroups are then further investigated with respect to their genotype to provide further evidence of the biological relevance of the detected subgroup division. Further, we compare the performance of ICS also to five other popular dimension reduction methods. CONCLUSION: The ICS method was able to detect subgroups in data where the PCA fails to detect anything. Hence, we promote the application of ICS to high-throughput genotype data in addition to the established PCA. Especially in statistical programming environments like e.g. R, its application does not add any computational burden to the analysis pipeline. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1589-9) contains supplementary material, which is available to authorized users. BioMed Central 2017-03-16 /pmc/articles/PMC5356247/ /pubmed/28302061 http://dx.doi.org/10.1186/s12859-017-1589-9 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Fischer, Daniel Honkatukia, Mervi Tuiskula-Haavisto, Maria Nordhausen, Klaus Cavero, David Preisinger, Rudolf Vilkki, Johanna Subgroup detection in genotype data using invariant coordinate selection |
title | Subgroup detection in genotype data using invariant coordinate selection |
title_full | Subgroup detection in genotype data using invariant coordinate selection |
title_fullStr | Subgroup detection in genotype data using invariant coordinate selection |
title_full_unstemmed | Subgroup detection in genotype data using invariant coordinate selection |
title_short | Subgroup detection in genotype data using invariant coordinate selection |
title_sort | subgroup detection in genotype data using invariant coordinate selection |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5356247/ https://www.ncbi.nlm.nih.gov/pubmed/28302061 http://dx.doi.org/10.1186/s12859-017-1589-9 |
work_keys_str_mv | AT fischerdaniel subgroupdetectioningenotypedatausinginvariantcoordinateselection AT honkatukiamervi subgroupdetectioningenotypedatausinginvariantcoordinateselection AT tuiskulahaavistomaria subgroupdetectioningenotypedatausinginvariantcoordinateselection AT nordhausenklaus subgroupdetectioningenotypedatausinginvariantcoordinateselection AT caverodavid subgroupdetectioningenotypedatausinginvariantcoordinateselection AT preisingerrudolf subgroupdetectioningenotypedatausinginvariantcoordinateselection AT vilkkijohanna subgroupdetectioningenotypedatausinginvariantcoordinateselection |