Cargando…
A structured overview of simultaneous component based data integration
BACKGROUND: Data integration is currently one of the main challenges in the biomedical sciences. Often different pieces of information are gathered on the same set of entities (e.g., tissues, culture samples, biomolecules) with the different pieces stemming, for example, from different measurement t...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2752463/ https://www.ncbi.nlm.nih.gov/pubmed/19671149 http://dx.doi.org/10.1186/1471-2105-10-246 |
_version_ | 1782172289380384768 |
---|---|
author | Van Deun, Katrijn Smilde, Age K van der Werf, Mariët J Kiers, Henk AL Van Mechelen, Iven |
author_facet | Van Deun, Katrijn Smilde, Age K van der Werf, Mariët J Kiers, Henk AL Van Mechelen, Iven |
author_sort | Van Deun, Katrijn |
collection | PubMed |
description | BACKGROUND: Data integration is currently one of the main challenges in the biomedical sciences. Often different pieces of information are gathered on the same set of entities (e.g., tissues, culture samples, biomolecules) with the different pieces stemming, for example, from different measurement techniques. This implies that more and more data appear that consist of two or more data arrays that have a shared mode. An integrative analysis of such coupled data should be based on a simultaneous analysis of all data arrays. In this respect, the family of simultaneous component methods (e.g., SUM-PCA, unrestricted PCovR, MFA, STATIS, and SCA-P) is a natural choice. Yet, different simultaneous component methods may lead to quite different results. RESULTS: We offer a structured overview of simultaneous component methods that frames them in a principal components setting such that both the common core of the methods and the specific elements with regard to which they differ are highlighted. An overview of principles is given that may guide the data analyst in choosing an appropriate simultaneous component method. Several theoretical and practical issues are illustrated with an empirical example on metabolomics data for Escherichia coli as obtained with different analytical chemical measurement methods. CONCLUSION: Of the aspects in which the simultaneous component methods differ, pre-processing and weighting are consequential. Especially, the type of weighting of the different matrices is essential for simultaneous component analysis. These types are shown to be linked to different specifications of the idea of a fair integration of the different coupled arrays. |
format | Text |
id | pubmed-2752463 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-27524632009-09-26 A structured overview of simultaneous component based data integration Van Deun, Katrijn Smilde, Age K van der Werf, Mariët J Kiers, Henk AL Van Mechelen, Iven BMC Bioinformatics Research Article BACKGROUND: Data integration is currently one of the main challenges in the biomedical sciences. Often different pieces of information are gathered on the same set of entities (e.g., tissues, culture samples, biomolecules) with the different pieces stemming, for example, from different measurement techniques. This implies that more and more data appear that consist of two or more data arrays that have a shared mode. An integrative analysis of such coupled data should be based on a simultaneous analysis of all data arrays. In this respect, the family of simultaneous component methods (e.g., SUM-PCA, unrestricted PCovR, MFA, STATIS, and SCA-P) is a natural choice. Yet, different simultaneous component methods may lead to quite different results. RESULTS: We offer a structured overview of simultaneous component methods that frames them in a principal components setting such that both the common core of the methods and the specific elements with regard to which they differ are highlighted. An overview of principles is given that may guide the data analyst in choosing an appropriate simultaneous component method. Several theoretical and practical issues are illustrated with an empirical example on metabolomics data for Escherichia coli as obtained with different analytical chemical measurement methods. CONCLUSION: Of the aspects in which the simultaneous component methods differ, pre-processing and weighting are consequential. Especially, the type of weighting of the different matrices is essential for simultaneous component analysis. These types are shown to be linked to different specifications of the idea of a fair integration of the different coupled arrays. BioMed Central 2009-08-11 /pmc/articles/PMC2752463/ /pubmed/19671149 http://dx.doi.org/10.1186/1471-2105-10-246 Text en Copyright © 2009 Van Deun et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Van Deun, Katrijn Smilde, Age K van der Werf, Mariët J Kiers, Henk AL Van Mechelen, Iven A structured overview of simultaneous component based data integration |
title | A structured overview of simultaneous component based data integration |
title_full | A structured overview of simultaneous component based data integration |
title_fullStr | A structured overview of simultaneous component based data integration |
title_full_unstemmed | A structured overview of simultaneous component based data integration |
title_short | A structured overview of simultaneous component based data integration |
title_sort | structured overview of simultaneous component based data integration |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2752463/ https://www.ncbi.nlm.nih.gov/pubmed/19671149 http://dx.doi.org/10.1186/1471-2105-10-246 |
work_keys_str_mv | AT vandeunkatrijn astructuredoverviewofsimultaneouscomponentbaseddataintegration AT smildeagek astructuredoverviewofsimultaneouscomponentbaseddataintegration AT vanderwerfmarietj astructuredoverviewofsimultaneouscomponentbaseddataintegration AT kiershenkal astructuredoverviewofsimultaneouscomponentbaseddataintegration AT vanmecheleniven astructuredoverviewofsimultaneouscomponentbaseddataintegration AT vandeunkatrijn structuredoverviewofsimultaneouscomponentbaseddataintegration AT smildeagek structuredoverviewofsimultaneouscomponentbaseddataintegration AT vanderwerfmarietj structuredoverviewofsimultaneouscomponentbaseddataintegration AT kiershenkal structuredoverviewofsimultaneouscomponentbaseddataintegration AT vanmecheleniven structuredoverviewofsimultaneouscomponentbaseddataintegration |