Cargando…

Identifying Statistical Dependence in Genomic Sequences via Mutual Information Estimates

Questions of understanding and quantifying the representation and amount of information in organisms have become a central part of biological research, as they potentially hold the key to fundamental advances. In this paper, we demonstrate the use of information-theoretic tools for the task of ident...

Descripción completa

Detalles Bibliográficos
Autores principales: Aktulga, Hasan Metin, Kontoyiannis, Ioannis, Lyznik, L Alex, Szpankowski, Lukasz, Grama, Ananth Y, Szpankowski, Wojciech
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3171327/
https://www.ncbi.nlm.nih.gov/pubmed/18301721
http://dx.doi.org/10.1155/2007/14741
_version_ 1782211737590693888
author Aktulga, Hasan Metin
Kontoyiannis, Ioannis
Lyznik, L Alex
Szpankowski, Lukasz
Grama, Ananth Y
Szpankowski, Wojciech
author_facet Aktulga, Hasan Metin
Kontoyiannis, Ioannis
Lyznik, L Alex
Szpankowski, Lukasz
Grama, Ananth Y
Szpankowski, Wojciech
author_sort Aktulga, Hasan Metin
collection PubMed
description Questions of understanding and quantifying the representation and amount of information in organisms have become a central part of biological research, as they potentially hold the key to fundamental advances. In this paper, we demonstrate the use of information-theoretic tools for the task of identifying segments of biomolecules (DNA or RNA) that are statistically correlated. We develop a precise and reliable methodology, based on the notion of mutual information, for finding and extracting statistical as well as structural dependencies. A simple threshold function is defined, and its use in quantifying the level of significance of dependencies between biological segments is explored. These tools are used in two specific applications. First, they are used for the identification of correlations between different parts of the maize zmSRp32 gene. There, we find significant dependencies between the [Image: see text] untranslated region in zmSRp32 and its alternatively spliced exons. This observation may indicate the presence of as-yet unknown alternative splicing mechanisms or structural scaffolds. Second, using data from the FBI's combined DNA index system (CODIS), we demonstrate that our approach is particularly well suited for the problem of discovering short tandem repeats—an application of importance in genetic profiling.
format Online
Article
Text
id pubmed-3171327
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Springer
record_format MEDLINE/PubMed
spelling pubmed-31713272011-09-13 Identifying Statistical Dependence in Genomic Sequences via Mutual Information Estimates Aktulga, Hasan Metin Kontoyiannis, Ioannis Lyznik, L Alex Szpankowski, Lukasz Grama, Ananth Y Szpankowski, Wojciech EURASIP J Bioinform Syst Biol Research Article Questions of understanding and quantifying the representation and amount of information in organisms have become a central part of biological research, as they potentially hold the key to fundamental advances. In this paper, we demonstrate the use of information-theoretic tools for the task of identifying segments of biomolecules (DNA or RNA) that are statistically correlated. We develop a precise and reliable methodology, based on the notion of mutual information, for finding and extracting statistical as well as structural dependencies. A simple threshold function is defined, and its use in quantifying the level of significance of dependencies between biological segments is explored. These tools are used in two specific applications. First, they are used for the identification of correlations between different parts of the maize zmSRp32 gene. There, we find significant dependencies between the [Image: see text] untranslated region in zmSRp32 and its alternatively spliced exons. This observation may indicate the presence of as-yet unknown alternative splicing mechanisms or structural scaffolds. Second, using data from the FBI's combined DNA index system (CODIS), we demonstrate that our approach is particularly well suited for the problem of discovering short tandem repeats—an application of importance in genetic profiling. Springer 2007-12-05 /pmc/articles/PMC3171327/ /pubmed/18301721 http://dx.doi.org/10.1155/2007/14741 Text en Copyright © 2007 Hasan Metin Aktulga et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Aktulga, Hasan Metin
Kontoyiannis, Ioannis
Lyznik, L Alex
Szpankowski, Lukasz
Grama, Ananth Y
Szpankowski, Wojciech
Identifying Statistical Dependence in Genomic Sequences via Mutual Information Estimates
title Identifying Statistical Dependence in Genomic Sequences via Mutual Information Estimates
title_full Identifying Statistical Dependence in Genomic Sequences via Mutual Information Estimates
title_fullStr Identifying Statistical Dependence in Genomic Sequences via Mutual Information Estimates
title_full_unstemmed Identifying Statistical Dependence in Genomic Sequences via Mutual Information Estimates
title_short Identifying Statistical Dependence in Genomic Sequences via Mutual Information Estimates
title_sort identifying statistical dependence in genomic sequences via mutual information estimates
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3171327/
https://www.ncbi.nlm.nih.gov/pubmed/18301721
http://dx.doi.org/10.1155/2007/14741
work_keys_str_mv AT aktulgahasanmetin identifyingstatisticaldependenceingenomicsequencesviamutualinformationestimates
AT kontoyiannisioannis identifyingstatisticaldependenceingenomicsequencesviamutualinformationestimates
AT lyzniklalex identifyingstatisticaldependenceingenomicsequencesviamutualinformationestimates
AT szpankowskilukasz identifyingstatisticaldependenceingenomicsequencesviamutualinformationestimates
AT gramaananthy identifyingstatisticaldependenceingenomicsequencesviamutualinformationestimates
AT szpankowskiwojciech identifyingstatisticaldependenceingenomicsequencesviamutualinformationestimates