Cargando…

Missing value imputation for epistatic MAPs

BACKGROUND: Epistatic miniarray profiling (E-MAPs) is a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. Thes...

Descripción completa

Detalles Bibliográficos
Autores principales: Ryan, Colm, Greene, Derek, Cagney, Gerard, Cunningham, Pádraig
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2873538/
https://www.ncbi.nlm.nih.gov/pubmed/20406472
http://dx.doi.org/10.1186/1471-2105-11-197
_version_ 1782181364529889280
author Ryan, Colm
Greene, Derek
Cagney, Gerard
Cunningham, Pádraig
author_facet Ryan, Colm
Greene, Derek
Cagney, Gerard
Cunningham, Pádraig
author_sort Ryan, Colm
collection PubMed
description BACKGROUND: Epistatic miniarray profiling (E-MAPs) is a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. An effective method for imputing interactions would therefore increase the types of possible analysis, as well as increase the potential to identify novel functional interactions between gene pairs. Several methods have been developed to handle missing values in microarray data, but it is unclear how applicable these methods are to E-MAP data because of their pairwise nature and the significantly larger number of missing values. Here we evaluate four alternative imputation strategies, three local (Nearest neighbor-based) and one global (PCA-based), that have been modified to work with symmetric pairwise data. RESULTS: We identify different categories for the missing data based on their underlying cause, and show that values from the largest category can be imputed effectively. We compare local and global imputation approaches across a variety of distinct E-MAP datasets, showing that both are competitive and preferable to filling in with zeros. In addition we show that these methods are effective in an E-MAP from a different species, suggesting that pairwise imputation techniques will be increasingly useful as analogous epistasis mapping techniques are developed in different species. We show that strongly alleviating interactions are significantly more difficult to predict than strongly aggravating interactions. Finally we show that imputed interactions, generated using nearest neighbor methods, are enriched for annotations in the same manner as measured interactions. Therefore our method potentially expands the number of mapped epistatic interactions. In addition we make implementations of our algorithms available for use by other researchers. CONCLUSIONS: We address the problem of missing value imputation for E-MAPs, and suggest the use of symmetric nearest neighbor based approaches as they offer consistently accurate imputations across multiple datasets in a tractable manner.
format Text
id pubmed-2873538
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28735382010-05-20 Missing value imputation for epistatic MAPs Ryan, Colm Greene, Derek Cagney, Gerard Cunningham, Pádraig BMC Bioinformatics Research article BACKGROUND: Epistatic miniarray profiling (E-MAPs) is a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. An effective method for imputing interactions would therefore increase the types of possible analysis, as well as increase the potential to identify novel functional interactions between gene pairs. Several methods have been developed to handle missing values in microarray data, but it is unclear how applicable these methods are to E-MAP data because of their pairwise nature and the significantly larger number of missing values. Here we evaluate four alternative imputation strategies, three local (Nearest neighbor-based) and one global (PCA-based), that have been modified to work with symmetric pairwise data. RESULTS: We identify different categories for the missing data based on their underlying cause, and show that values from the largest category can be imputed effectively. We compare local and global imputation approaches across a variety of distinct E-MAP datasets, showing that both are competitive and preferable to filling in with zeros. In addition we show that these methods are effective in an E-MAP from a different species, suggesting that pairwise imputation techniques will be increasingly useful as analogous epistasis mapping techniques are developed in different species. We show that strongly alleviating interactions are significantly more difficult to predict than strongly aggravating interactions. Finally we show that imputed interactions, generated using nearest neighbor methods, are enriched for annotations in the same manner as measured interactions. Therefore our method potentially expands the number of mapped epistatic interactions. In addition we make implementations of our algorithms available for use by other researchers. CONCLUSIONS: We address the problem of missing value imputation for E-MAPs, and suggest the use of symmetric nearest neighbor based approaches as they offer consistently accurate imputations across multiple datasets in a tractable manner. BioMed Central 2010-04-20 /pmc/articles/PMC2873538/ /pubmed/20406472 http://dx.doi.org/10.1186/1471-2105-11-197 Text en Copyright ©2010 Ryan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
Ryan, Colm
Greene, Derek
Cagney, Gerard
Cunningham, Pádraig
Missing value imputation for epistatic MAPs
title Missing value imputation for epistatic MAPs
title_full Missing value imputation for epistatic MAPs
title_fullStr Missing value imputation for epistatic MAPs
title_full_unstemmed Missing value imputation for epistatic MAPs
title_short Missing value imputation for epistatic MAPs
title_sort missing value imputation for epistatic maps
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2873538/
https://www.ncbi.nlm.nih.gov/pubmed/20406472
http://dx.doi.org/10.1186/1471-2105-11-197
work_keys_str_mv AT ryancolm missingvalueimputationforepistaticmaps
AT greenederek missingvalueimputationforepistaticmaps
AT cagneygerard missingvalueimputationforepistaticmaps
AT cunninghampadraig missingvalueimputationforepistaticmaps