Cargando…

Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework

BACKGROUND: In omics data integration studies, it is common, for a variety of reasons, for some individuals to not be present in all data tables. Missing row values are challenging to deal with because most statistical methods cannot be directly applied to incomplete datasets. To overcome this issue...

Descripción completa

Detalles Bibliográficos
Autores principales:	Voillet, Valentin, Besse, Philippe, Liaubet, Laurence, San Cristobal, Magali, González, Ignacio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5048483/ https://www.ncbi.nlm.nih.gov/pubmed/27716030 http://dx.doi.org/10.1186/s12859-016-1273-5

_version_	1782457599496552448
author	Voillet, Valentin Besse, Philippe Liaubet, Laurence San Cristobal, Magali González, Ignacio
author_facet	Voillet, Valentin Besse, Philippe Liaubet, Laurence San Cristobal, Magali González, Ignacio
author_sort	Voillet, Valentin
collection	PubMed
description	BACKGROUND: In omics data integration studies, it is common, for a variety of reasons, for some individuals to not be present in all data tables. Missing row values are challenging to deal with because most statistical methods cannot be directly applied to incomplete datasets. To overcome this issue, we propose a multiple imputation (MI) approach in a multivariate framework. In this study, we focus on multiple factor analysis (MFA) as a tool to compare and integrate multiple layers of information. MI involves filling the missing rows with plausible values, resulting in M completed datasets. MFA is then applied to each completed dataset to produce M different configurations (the matrices of coordinates of individuals). Finally, the M configurations are combined to yield a single consensus solution. RESULTS: We assessed the performance of our method, named MI-MFA, on two real omics datasets. Incomplete artificial datasets with different patterns of missingness were created from these data. The MI-MFA results were compared with two other approaches i.e., regularized iterative MFA (RI-MFA) and mean variable imputation (MVI-MFA). For each configuration resulting from these three strategies, the suitability of the solution was determined against the true MFA configuration obtained from the original data and a comprehensive graphical comparison showing how the MI-, RI- or MVI-MFA configurations diverge from the true configuration was produced. Two approaches i.e., confidence ellipses and convex hulls, to visualize and assess the uncertainty due to missing values were also described. We showed how the areas of ellipses and convex hulls increased with the number of missing individuals. A free and easy-to-use code was proposed to implement the MI-MFA method in the R statistical environment. CONCLUSIONS: We believe that MI-MFA provides a useful and attractive method for estimating the coordinates of individuals on the first MFA components despite missing rows. MI-MFA configurations were close to the true configuration even when many individuals were missing in several data tables. This method takes into account the uncertainty of MI-MFA configurations induced by the missing rows, thereby allowing the reliability of the results to be evaluated. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1273-5) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5048483
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-50484832016-10-11 Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework Voillet, Valentin Besse, Philippe Liaubet, Laurence San Cristobal, Magali González, Ignacio BMC Bioinformatics Methodology Article BACKGROUND: In omics data integration studies, it is common, for a variety of reasons, for some individuals to not be present in all data tables. Missing row values are challenging to deal with because most statistical methods cannot be directly applied to incomplete datasets. To overcome this issue, we propose a multiple imputation (MI) approach in a multivariate framework. In this study, we focus on multiple factor analysis (MFA) as a tool to compare and integrate multiple layers of information. MI involves filling the missing rows with plausible values, resulting in M completed datasets. MFA is then applied to each completed dataset to produce M different configurations (the matrices of coordinates of individuals). Finally, the M configurations are combined to yield a single consensus solution. RESULTS: We assessed the performance of our method, named MI-MFA, on two real omics datasets. Incomplete artificial datasets with different patterns of missingness were created from these data. The MI-MFA results were compared with two other approaches i.e., regularized iterative MFA (RI-MFA) and mean variable imputation (MVI-MFA). For each configuration resulting from these three strategies, the suitability of the solution was determined against the true MFA configuration obtained from the original data and a comprehensive graphical comparison showing how the MI-, RI- or MVI-MFA configurations diverge from the true configuration was produced. Two approaches i.e., confidence ellipses and convex hulls, to visualize and assess the uncertainty due to missing values were also described. We showed how the areas of ellipses and convex hulls increased with the number of missing individuals. A free and easy-to-use code was proposed to implement the MI-MFA method in the R statistical environment. CONCLUSIONS: We believe that MI-MFA provides a useful and attractive method for estimating the coordinates of individuals on the first MFA components despite missing rows. MI-MFA configurations were close to the true configuration even when many individuals were missing in several data tables. This method takes into account the uncertainty of MI-MFA configurations induced by the missing rows, thereby allowing the reliability of the results to be evaluated. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1273-5) contains supplementary material, which is available to authorized users. BioMed Central 2016-10-03 /pmc/articles/PMC5048483/ /pubmed/27716030 http://dx.doi.org/10.1186/s12859-016-1273-5 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Voillet, Valentin Besse, Philippe Liaubet, Laurence San Cristobal, Magali González, Ignacio Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework
title	Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework
title_full	Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework
title_fullStr	Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework
title_full_unstemmed	Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework
title_short	Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework
title_sort	handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5048483/ https://www.ncbi.nlm.nih.gov/pubmed/27716030 http://dx.doi.org/10.1186/s12859-016-1273-5
work_keys_str_mv	AT voilletvalentin handlingmissingrowsinmultiomicsdataintegrationmultipleimputationinmultiplefactoranalysisframework AT bessephilippe handlingmissingrowsinmultiomicsdataintegrationmultipleimputationinmultiplefactoranalysisframework AT liaubetlaurence handlingmissingrowsinmultiomicsdataintegrationmultipleimputationinmultiplefactoranalysisframework AT sancristobalmagali handlingmissingrowsinmultiomicsdataintegrationmultipleimputationinmultiplefactoranalysisframework AT gonzalezignacio handlingmissingrowsinmultiomicsdataintegrationmultipleimputationinmultiplefactoranalysisframework

Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework

Ejemplares similares