Cargando…

The Information Content of Discrete Functions and Their Application in Genetic Data Analysis

The complex of central problems in data analysis consists of three components: (1) detecting the dependence of variables using quantitative measures, (2) defining the significance of these dependence measures, and (3) inferring the functional relationships among dependent variables. We have argued p...

Descripción completa

Detalles Bibliográficos
Autores principales: Sakhanenko, Nikita A., Kunert-Graf, James, Galas, David J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Mary Ann Liebert, Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5729883/
https://www.ncbi.nlm.nih.gov/pubmed/29028175
http://dx.doi.org/10.1089/cmb.2017.0143
_version_ 1783286268888612864
author Sakhanenko, Nikita A.
Kunert-Graf, James
Galas, David J.
author_facet Sakhanenko, Nikita A.
Kunert-Graf, James
Galas, David J.
author_sort Sakhanenko, Nikita A.
collection PubMed
description The complex of central problems in data analysis consists of three components: (1) detecting the dependence of variables using quantitative measures, (2) defining the significance of these dependence measures, and (3) inferring the functional relationships among dependent variables. We have argued previously that an information theory approach allows separation of the detection problem from the inference of functional form problem. We approach here the third component of inferring functional forms based on information encoded in the functions. We present here a direct method for classifying the functional forms of discrete functions of three variables represented in data sets. Discrete variables are frequently encountered in data analysis, both as the result of inherently categorical variables and from the binning of continuous numerical variables into discrete alphabets of values. The fundamental question of how much information is contained in a given function is answered for these discrete functions, and their surprisingly complex relationships are illustrated. The all-important effect of noise on the inference of function classes is found to be highly heterogeneous and reveals some unexpected patterns. We apply this classification approach to an important area of biological data analysis—that of inference of genetic interactions. Genetic analysis provides a rich source of real and complex biological data analysis problems, and our general methods provide an analytical basis and tools for characterizing genetic problems and for analyzing genetic data. We illustrate the functional description and the classes of a number of common genetic interaction modes and also show how different modes vary widely in their sensitivity to noise.
format Online
Article
Text
id pubmed-5729883
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Mary Ann Liebert, Inc.
record_format MEDLINE/PubMed
spelling pubmed-57298832017-12-15 The Information Content of Discrete Functions and Their Application in Genetic Data Analysis Sakhanenko, Nikita A. Kunert-Graf, James Galas, David J. J Comput Biol Research Articles The complex of central problems in data analysis consists of three components: (1) detecting the dependence of variables using quantitative measures, (2) defining the significance of these dependence measures, and (3) inferring the functional relationships among dependent variables. We have argued previously that an information theory approach allows separation of the detection problem from the inference of functional form problem. We approach here the third component of inferring functional forms based on information encoded in the functions. We present here a direct method for classifying the functional forms of discrete functions of three variables represented in data sets. Discrete variables are frequently encountered in data analysis, both as the result of inherently categorical variables and from the binning of continuous numerical variables into discrete alphabets of values. The fundamental question of how much information is contained in a given function is answered for these discrete functions, and their surprisingly complex relationships are illustrated. The all-important effect of noise on the inference of function classes is found to be highly heterogeneous and reveals some unexpected patterns. We apply this classification approach to an important area of biological data analysis—that of inference of genetic interactions. Genetic analysis provides a rich source of real and complex biological data analysis problems, and our general methods provide an analytical basis and tools for characterizing genetic problems and for analyzing genetic data. We illustrate the functional description and the classes of a number of common genetic interaction modes and also show how different modes vary widely in their sensitivity to noise. Mary Ann Liebert, Inc. 2017-12-01 2017-12-01 /pmc/articles/PMC5729883/ /pubmed/29028175 http://dx.doi.org/10.1089/cmb.2017.0143 Text en © Nikita A. Sakhanenko, et al., 2017. Published by Mary Ann Liebert, Inc. This Open Access article is distributed under the terms of the Creative Commons Attribution Noncommercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
spellingShingle Research Articles
Sakhanenko, Nikita A.
Kunert-Graf, James
Galas, David J.
The Information Content of Discrete Functions and Their Application in Genetic Data Analysis
title The Information Content of Discrete Functions and Their Application in Genetic Data Analysis
title_full The Information Content of Discrete Functions and Their Application in Genetic Data Analysis
title_fullStr The Information Content of Discrete Functions and Their Application in Genetic Data Analysis
title_full_unstemmed The Information Content of Discrete Functions and Their Application in Genetic Data Analysis
title_short The Information Content of Discrete Functions and Their Application in Genetic Data Analysis
title_sort information content of discrete functions and their application in genetic data analysis
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5729883/
https://www.ncbi.nlm.nih.gov/pubmed/29028175
http://dx.doi.org/10.1089/cmb.2017.0143
work_keys_str_mv AT sakhanenkonikitaa theinformationcontentofdiscretefunctionsandtheirapplicationingeneticdataanalysis
AT kunertgrafjames theinformationcontentofdiscretefunctionsandtheirapplicationingeneticdataanalysis
AT galasdavidj theinformationcontentofdiscretefunctionsandtheirapplicationingeneticdataanalysis
AT sakhanenkonikitaa informationcontentofdiscretefunctionsandtheirapplicationingeneticdataanalysis
AT kunertgrafjames informationcontentofdiscretefunctionsandtheirapplicationingeneticdataanalysis
AT galasdavidj informationcontentofdiscretefunctionsandtheirapplicationingeneticdataanalysis