Cargando…
Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries
BACKGROUND: The joint analysis of several categorical variables is a common task in many areas of biology, and is becoming central to systems biology investigations whose goal is to identify potentially complex interaction among variables belonging to a network. Interactions of arbitrary complexity...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2233645/ https://www.ncbi.nlm.nih.gov/pubmed/18072965 http://dx.doi.org/10.1186/1471-2105-8-476 |
_version_ | 1782150274155020288 |
---|---|
author | Dahinden, Corinne Parmigiani, Giovanni Emerick, Mark C Bühlmann, Peter |
author_facet | Dahinden, Corinne Parmigiani, Giovanni Emerick, Mark C Bühlmann, Peter |
author_sort | Dahinden, Corinne |
collection | PubMed |
description | BACKGROUND: The joint analysis of several categorical variables is a common task in many areas of biology, and is becoming central to systems biology investigations whose goal is to identify potentially complex interaction among variables belonging to a network. Interactions of arbitrary complexity are traditionally modeled in statistics by log-linear models. It is challenging to extend these to the high dimensional and potentially sparse data arising in computational biology. An important example, which provides the motivation for this article, is the analysis of so-called full-length cDNA libraries of alternatively spliced genes, where we investigate relationships among the presence of various exons in transcript species. RESULTS: We develop methods to perform model selection and parameter estimation in log-linear models for the analysis of sparse contingency tables, to study the interaction of two or more factors. Maximum Likelihood estimation of log-linear model coefficients might not be appropriate because of the presence of zeros in the table's cells, and new methods are required. We propose a computationally efficient ℓ(1)-penalization approach extending the Lasso algorithm to this context, and compare it to other procedures in a simulation study. We then illustrate these algorithms on contingency tables arising from full-length cDNA libraries. CONCLUSION: We propose regularization methods that can be used successfully to detect complex interaction patterns among categorical variables in a broad range of biological problems involving categorical variables. |
format | Text |
id | pubmed-2233645 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-22336452008-02-07 Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries Dahinden, Corinne Parmigiani, Giovanni Emerick, Mark C Bühlmann, Peter BMC Bioinformatics Methodology Article BACKGROUND: The joint analysis of several categorical variables is a common task in many areas of biology, and is becoming central to systems biology investigations whose goal is to identify potentially complex interaction among variables belonging to a network. Interactions of arbitrary complexity are traditionally modeled in statistics by log-linear models. It is challenging to extend these to the high dimensional and potentially sparse data arising in computational biology. An important example, which provides the motivation for this article, is the analysis of so-called full-length cDNA libraries of alternatively spliced genes, where we investigate relationships among the presence of various exons in transcript species. RESULTS: We develop methods to perform model selection and parameter estimation in log-linear models for the analysis of sparse contingency tables, to study the interaction of two or more factors. Maximum Likelihood estimation of log-linear model coefficients might not be appropriate because of the presence of zeros in the table's cells, and new methods are required. We propose a computationally efficient ℓ(1)-penalization approach extending the Lasso algorithm to this context, and compare it to other procedures in a simulation study. We then illustrate these algorithms on contingency tables arising from full-length cDNA libraries. CONCLUSION: We propose regularization methods that can be used successfully to detect complex interaction patterns among categorical variables in a broad range of biological problems involving categorical variables. BioMed Central 2007-12-11 /pmc/articles/PMC2233645/ /pubmed/18072965 http://dx.doi.org/10.1186/1471-2105-8-476 Text en Copyright © 2007 Dahinden et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Dahinden, Corinne Parmigiani, Giovanni Emerick, Mark C Bühlmann, Peter Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries |
title | Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries |
title_full | Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries |
title_fullStr | Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries |
title_full_unstemmed | Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries |
title_short | Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries |
title_sort | penalized likelihood for sparse contingency tables with an application to full-length cdna libraries |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2233645/ https://www.ncbi.nlm.nih.gov/pubmed/18072965 http://dx.doi.org/10.1186/1471-2105-8-476 |
work_keys_str_mv | AT dahindencorinne penalizedlikelihoodforsparsecontingencytableswithanapplicationtofulllengthcdnalibraries AT parmigianigiovanni penalizedlikelihoodforsparsecontingencytableswithanapplicationtofulllengthcdnalibraries AT emerickmarkc penalizedlikelihoodforsparsecontingencytableswithanapplicationtofulllengthcdnalibraries AT buhlmannpeter penalizedlikelihoodforsparsecontingencytableswithanapplicationtofulllengthcdnalibraries |