Cargando…

Methylation data imputation performances under different representations and missingness patterns

BACKGROUND: High-throughput technologies enable the cost-effective collection and analysis of DNA methylation data throughout the human genome. This naturally entails missing values management that can complicate the analysis of the data. Several general and specific imputation methods are suitable...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lena, Pietro Di, Sala, Claudia, Prodi, Andrea, Nardini, Christine
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7325236/ https://www.ncbi.nlm.nih.gov/pubmed/32600298 http://dx.doi.org/10.1186/s12859-020-03592-5

_version_	1783552110989672448
author	Lena, Pietro Di Sala, Claudia Prodi, Andrea Nardini, Christine
author_facet	Lena, Pietro Di Sala, Claudia Prodi, Andrea Nardini, Christine
author_sort	Lena, Pietro Di
collection	PubMed
description	BACKGROUND: High-throughput technologies enable the cost-effective collection and analysis of DNA methylation data throughout the human genome. This naturally entails missing values management that can complicate the analysis of the data. Several general and specific imputation methods are suitable for DNA methylation data. However, there are no detailed studies of their performances under different missing data mechanisms –(completely) at random or not- and different representations of DNA methylation levels (β and M-value). RESULTS: We make an extensive analysis of the imputation performances of seven imputation methods on simulated missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR) methylation data. We further consider imputation performances on the popular β- and M-value representations of methylation levels. Overall, β-values enable better imputation performances than M-values. Imputation accuracy is lower for mid-range β-values, while it is generally more accurate for values at the extremes of the β-value range. The MAR values distribution is on the average more dense in the mid-range in comparison to the expected β-value distribution. As a consequence, MAR values are on average harder to impute. CONCLUSIONS: The results of the analysis provide guidelines for the most suitable imputation approaches for DNA methylation data under different representations of DNA methylation levels and different missing data mechanisms.
format	Online Article Text
id	pubmed-7325236
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-73252362020-06-30 Methylation data imputation performances under different representations and missingness patterns Lena, Pietro Di Sala, Claudia Prodi, Andrea Nardini, Christine BMC Bioinformatics Research Article BACKGROUND: High-throughput technologies enable the cost-effective collection and analysis of DNA methylation data throughout the human genome. This naturally entails missing values management that can complicate the analysis of the data. Several general and specific imputation methods are suitable for DNA methylation data. However, there are no detailed studies of their performances under different missing data mechanisms –(completely) at random or not- and different representations of DNA methylation levels (β and M-value). RESULTS: We make an extensive analysis of the imputation performances of seven imputation methods on simulated missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR) methylation data. We further consider imputation performances on the popular β- and M-value representations of methylation levels. Overall, β-values enable better imputation performances than M-values. Imputation accuracy is lower for mid-range β-values, while it is generally more accurate for values at the extremes of the β-value range. The MAR values distribution is on the average more dense in the mid-range in comparison to the expected β-value distribution. As a consequence, MAR values are on average harder to impute. CONCLUSIONS: The results of the analysis provide guidelines for the most suitable imputation approaches for DNA methylation data under different representations of DNA methylation levels and different missing data mechanisms. BioMed Central 2020-06-29 /pmc/articles/PMC7325236/ /pubmed/32600298 http://dx.doi.org/10.1186/s12859-020-03592-5 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Article Lena, Pietro Di Sala, Claudia Prodi, Andrea Nardini, Christine Methylation data imputation performances under different representations and missingness patterns
title	Methylation data imputation performances under different representations and missingness patterns
title_full	Methylation data imputation performances under different representations and missingness patterns
title_fullStr	Methylation data imputation performances under different representations and missingness patterns
title_full_unstemmed	Methylation data imputation performances under different representations and missingness patterns
title_short	Methylation data imputation performances under different representations and missingness patterns
title_sort	methylation data imputation performances under different representations and missingness patterns
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7325236/ https://www.ncbi.nlm.nih.gov/pubmed/32600298 http://dx.doi.org/10.1186/s12859-020-03592-5
work_keys_str_mv	AT lenapietrodi methylationdataimputationperformancesunderdifferentrepresentationsandmissingnesspatterns AT salaclaudia methylationdataimputationperformancesunderdifferentrepresentationsandmissingnesspatterns AT prodiandrea methylationdataimputationperformancesunderdifferentrepresentationsandmissingnesspatterns AT nardinichristine methylationdataimputationperformancesunderdifferentrepresentationsandmissingnesspatterns

Methylation data imputation performances under different representations and missingness patterns

Ejemplares similares