Cargando…

Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries

BACKGROUND: The joint analysis of several categorical variables is a common task in many areas of biology, and is becoming central to systems biology investigations whose goal is to identify potentially complex interaction among variables belonging to a network. Interactions of arbitrary complexity...

Descripción completa

Detalles Bibliográficos
Autores principales: Dahinden, Corinne, Parmigiani, Giovanni, Emerick, Mark C, Bühlmann, Peter
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2233645/
https://www.ncbi.nlm.nih.gov/pubmed/18072965
http://dx.doi.org/10.1186/1471-2105-8-476
_version_ 1782150274155020288
author Dahinden, Corinne
Parmigiani, Giovanni
Emerick, Mark C
Bühlmann, Peter
author_facet Dahinden, Corinne
Parmigiani, Giovanni
Emerick, Mark C
Bühlmann, Peter
author_sort Dahinden, Corinne
collection PubMed
description BACKGROUND: The joint analysis of several categorical variables is a common task in many areas of biology, and is becoming central to systems biology investigations whose goal is to identify potentially complex interaction among variables belonging to a network. Interactions of arbitrary complexity are traditionally modeled in statistics by log-linear models. It is challenging to extend these to the high dimensional and potentially sparse data arising in computational biology. An important example, which provides the motivation for this article, is the analysis of so-called full-length cDNA libraries of alternatively spliced genes, where we investigate relationships among the presence of various exons in transcript species. RESULTS: We develop methods to perform model selection and parameter estimation in log-linear models for the analysis of sparse contingency tables, to study the interaction of two or more factors. Maximum Likelihood estimation of log-linear model coefficients might not be appropriate because of the presence of zeros in the table's cells, and new methods are required. We propose a computationally efficient ℓ(1)-penalization approach extending the Lasso algorithm to this context, and compare it to other procedures in a simulation study. We then illustrate these algorithms on contingency tables arising from full-length cDNA libraries. CONCLUSION: We propose regularization methods that can be used successfully to detect complex interaction patterns among categorical variables in a broad range of biological problems involving categorical variables.
format Text
id pubmed-2233645
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22336452008-02-07 Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries Dahinden, Corinne Parmigiani, Giovanni Emerick, Mark C Bühlmann, Peter BMC Bioinformatics Methodology Article BACKGROUND: The joint analysis of several categorical variables is a common task in many areas of biology, and is becoming central to systems biology investigations whose goal is to identify potentially complex interaction among variables belonging to a network. Interactions of arbitrary complexity are traditionally modeled in statistics by log-linear models. It is challenging to extend these to the high dimensional and potentially sparse data arising in computational biology. An important example, which provides the motivation for this article, is the analysis of so-called full-length cDNA libraries of alternatively spliced genes, where we investigate relationships among the presence of various exons in transcript species. RESULTS: We develop methods to perform model selection and parameter estimation in log-linear models for the analysis of sparse contingency tables, to study the interaction of two or more factors. Maximum Likelihood estimation of log-linear model coefficients might not be appropriate because of the presence of zeros in the table's cells, and new methods are required. We propose a computationally efficient ℓ(1)-penalization approach extending the Lasso algorithm to this context, and compare it to other procedures in a simulation study. We then illustrate these algorithms on contingency tables arising from full-length cDNA libraries. CONCLUSION: We propose regularization methods that can be used successfully to detect complex interaction patterns among categorical variables in a broad range of biological problems involving categorical variables. BioMed Central 2007-12-11 /pmc/articles/PMC2233645/ /pubmed/18072965 http://dx.doi.org/10.1186/1471-2105-8-476 Text en Copyright © 2007 Dahinden et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Dahinden, Corinne
Parmigiani, Giovanni
Emerick, Mark C
Bühlmann, Peter
Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries
title Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries
title_full Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries
title_fullStr Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries
title_full_unstemmed Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries
title_short Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries
title_sort penalized likelihood for sparse contingency tables with an application to full-length cdna libraries
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2233645/
https://www.ncbi.nlm.nih.gov/pubmed/18072965
http://dx.doi.org/10.1186/1471-2105-8-476
work_keys_str_mv AT dahindencorinne penalizedlikelihoodforsparsecontingencytableswithanapplicationtofulllengthcdnalibraries
AT parmigianigiovanni penalizedlikelihoodforsparsecontingencytableswithanapplicationtofulllengthcdnalibraries
AT emerickmarkc penalizedlikelihoodforsparsecontingencytableswithanapplicationtofulllengthcdnalibraries
AT buhlmannpeter penalizedlikelihoodforsparsecontingencytableswithanapplicationtofulllengthcdnalibraries