Cargando…

Machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression data

BACKGROUND: Retinal photoreceptors are highly specialised cells, which detect light and are central to mammalian vision. Many retinal diseases occur as a result of inherited dysfunction of the rod and cone photoreceptor cells. Development and maintenance of photoreceptors requires appropriate regula...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Haiying, Zheng, Huiru, Simpson, David, Azuaje, Francisco
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1421439/
https://www.ncbi.nlm.nih.gov/pubmed/16524483
http://dx.doi.org/10.1186/1471-2105-7-116
_version_ 1782127178619551744
author Wang, Haiying
Zheng, Huiru
Simpson, David
Azuaje, Francisco
author_facet Wang, Haiying
Zheng, Huiru
Simpson, David
Azuaje, Francisco
author_sort Wang, Haiying
collection PubMed
description BACKGROUND: Retinal photoreceptors are highly specialised cells, which detect light and are central to mammalian vision. Many retinal diseases occur as a result of inherited dysfunction of the rod and cone photoreceptor cells. Development and maintenance of photoreceptors requires appropriate regulation of the many genes specifically or highly expressed in these cells. Over the last decades, different experimental approaches have been developed to identify photoreceptor enriched genes. Recent progress in RNA analysis technology has generated large amounts of gene expression data relevant to retinal development. This paper assesses a machine learning methodology for supporting the identification of photoreceptor enriched genes based on expression data. RESULTS: Based on the analysis of publicly-available gene expression data from the developing mouse retina generated by serial analysis of gene expression (SAGE), this paper presents a predictive methodology comprising several in silico models for detecting key complex features and relationships encoded in the data, which may be useful to distinguish genes in terms of their functional roles. In order to understand temporal patterns of photoreceptor gene expression during retinal development, a two-way cluster analysis was firstly performed. By clustering SAGE libraries, a hierarchical tree reflecting relationships between developmental stages was obtained. By clustering SAGE tags, a more comprehensive expression profile for photoreceptor cells was revealed. To demonstrate the usefulness of machine learning-based models in predicting functional associations from the SAGE data, three supervised classification models were compared. The results indicated that a relatively simple instance-based model (KStar model) performed significantly better than relatively more complex algorithms, e.g. neural networks. To deal with the problem of functional class imbalance occurring in the dataset, two data re-sampling techniques were studied. A random over-sampling method supported the implementation of the most powerful prediction models. The KStar model was also able to achieve higher predictive sensitivities and specificities using random over-sampling techniques. CONCLUSION: The approaches assessed in this paper represent an efficient and relatively inexpensive in silico methodology for supporting large-scale analysis of photoreceptor gene expression by SAGE. They may be applied as complementary methodologies to support functional predictions before implementing more comprehensive, experimental prediction and validation methods. They may also be combined with other large-scale, data-driven methods to facilitate the inference of transcriptional regulatory networks in the developing retina. Furthermore, the methodology assessed may be applied to other data domains.
format Text
id pubmed-1421439
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-14214392006-04-01 Machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression data Wang, Haiying Zheng, Huiru Simpson, David Azuaje, Francisco BMC Bioinformatics Research Article BACKGROUND: Retinal photoreceptors are highly specialised cells, which detect light and are central to mammalian vision. Many retinal diseases occur as a result of inherited dysfunction of the rod and cone photoreceptor cells. Development and maintenance of photoreceptors requires appropriate regulation of the many genes specifically or highly expressed in these cells. Over the last decades, different experimental approaches have been developed to identify photoreceptor enriched genes. Recent progress in RNA analysis technology has generated large amounts of gene expression data relevant to retinal development. This paper assesses a machine learning methodology for supporting the identification of photoreceptor enriched genes based on expression data. RESULTS: Based on the analysis of publicly-available gene expression data from the developing mouse retina generated by serial analysis of gene expression (SAGE), this paper presents a predictive methodology comprising several in silico models for detecting key complex features and relationships encoded in the data, which may be useful to distinguish genes in terms of their functional roles. In order to understand temporal patterns of photoreceptor gene expression during retinal development, a two-way cluster analysis was firstly performed. By clustering SAGE libraries, a hierarchical tree reflecting relationships between developmental stages was obtained. By clustering SAGE tags, a more comprehensive expression profile for photoreceptor cells was revealed. To demonstrate the usefulness of machine learning-based models in predicting functional associations from the SAGE data, three supervised classification models were compared. The results indicated that a relatively simple instance-based model (KStar model) performed significantly better than relatively more complex algorithms, e.g. neural networks. To deal with the problem of functional class imbalance occurring in the dataset, two data re-sampling techniques were studied. A random over-sampling method supported the implementation of the most powerful prediction models. The KStar model was also able to achieve higher predictive sensitivities and specificities using random over-sampling techniques. CONCLUSION: The approaches assessed in this paper represent an efficient and relatively inexpensive in silico methodology for supporting large-scale analysis of photoreceptor gene expression by SAGE. They may be applied as complementary methodologies to support functional predictions before implementing more comprehensive, experimental prediction and validation methods. They may also be combined with other large-scale, data-driven methods to facilitate the inference of transcriptional regulatory networks in the developing retina. Furthermore, the methodology assessed may be applied to other data domains. BioMed Central 2006-03-08 /pmc/articles/PMC1421439/ /pubmed/16524483 http://dx.doi.org/10.1186/1471-2105-7-116 Text en Copyright © 2006 Wang et al; licensee BioMed Central Ltd.
spellingShingle Research Article
Wang, Haiying
Zheng, Huiru
Simpson, David
Azuaje, Francisco
Machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression data
title Machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression data
title_full Machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression data
title_fullStr Machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression data
title_full_unstemmed Machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression data
title_short Machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression data
title_sort machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1421439/
https://www.ncbi.nlm.nih.gov/pubmed/16524483
http://dx.doi.org/10.1186/1471-2105-7-116
work_keys_str_mv AT wanghaiying machinelearningapproachestosupportingtheidentificationofphotoreceptorenrichedgenesbasedonexpressiondata
AT zhenghuiru machinelearningapproachestosupportingtheidentificationofphotoreceptorenrichedgenesbasedonexpressiondata
AT simpsondavid machinelearningapproachestosupportingtheidentificationofphotoreceptorenrichedgenesbasedonexpressiondata
AT azuajefrancisco machinelearningapproachestosupportingtheidentificationofphotoreceptorenrichedgenesbasedonexpressiondata