Cargando…

Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein

BACKGROUND: A large number of papers have been published on analysis of microarray data with particular emphasis on normalization of data, detection of differentially expressed genes, clustering of genes and regulatory network. On other hand there are only few studies on relation between expression...

Descripción completa

Detalles Bibliográficos
Autores principales: Raghava, Gajendra PS, Han, Joon H
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1083413/
https://www.ncbi.nlm.nih.gov/pubmed/15773999
http://dx.doi.org/10.1186/1471-2105-6-59
_version_ 1782123782208487424
author Raghava, Gajendra PS
Han, Joon H
author_facet Raghava, Gajendra PS
Han, Joon H
author_sort Raghava, Gajendra PS
collection PubMed
description BACKGROUND: A large number of papers have been published on analysis of microarray data with particular emphasis on normalization of data, detection of differentially expressed genes, clustering of genes and regulatory network. On other hand there are only few studies on relation between expression level and composition of nucleotide/protein sequence, using expression data. There is a need to understand why particular genes/proteins express more in particular conditions. In this study, we analyze 3468 genes of Saccharomyces cerevisiae obtained from Holstege et al., (1998) to understand the relationship between expression level and amino acid composition. RESULTS: We compute the correlation between expression of a gene and amino acid composition of its protein. It was observed that some residues (like Ala, Gly, Arg and Val) have significant positive correlation (r > 0.20) and some other residues (Like Asp, Leu, Asn and Ser) have negative correlation (r < -0.15) with the expression of genes. A significant negative correlation (r = -0.18) was also found between length and gene expression. These observations indicate the relationship between percent composition and gene expression level. Thus, attempts have been made to develop a Support Vector Machine (SVM) based method for predicting the expression level of genes from its protein sequence. In this method the SVM is trained with proteins whose gene expression data is known in a given condition. Then trained SVM is used to predict the gene expression of other proteins of the same organism in the same condition. A correlation coefficient r = 0.70 was obtained between predicted and experimentally determined expression of genes, which improves from r = 0.70 to 0.72 when dipeptide composition was used instead of residue composition. The method was evaluated using 5-fold cross validation test. We also demonstrate that amino acid composition information along with gene expression data can be used for improving the function classification of proteins. CONCLUSION: There is a correlation between gene expression and amino acid composition that can be used to predict the expression level of genes up to a certain extent. A web server based on the above strategy has been developed for calculating the correlation between amino acid composition and gene expression and prediction of expression level . This server will allow users to study the evolution from expression data.
format Text
id pubmed-1083413
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-10834132005-04-21 Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein Raghava, Gajendra PS Han, Joon H BMC Bioinformatics Methodology Article BACKGROUND: A large number of papers have been published on analysis of microarray data with particular emphasis on normalization of data, detection of differentially expressed genes, clustering of genes and regulatory network. On other hand there are only few studies on relation between expression level and composition of nucleotide/protein sequence, using expression data. There is a need to understand why particular genes/proteins express more in particular conditions. In this study, we analyze 3468 genes of Saccharomyces cerevisiae obtained from Holstege et al., (1998) to understand the relationship between expression level and amino acid composition. RESULTS: We compute the correlation between expression of a gene and amino acid composition of its protein. It was observed that some residues (like Ala, Gly, Arg and Val) have significant positive correlation (r > 0.20) and some other residues (Like Asp, Leu, Asn and Ser) have negative correlation (r < -0.15) with the expression of genes. A significant negative correlation (r = -0.18) was also found between length and gene expression. These observations indicate the relationship between percent composition and gene expression level. Thus, attempts have been made to develop a Support Vector Machine (SVM) based method for predicting the expression level of genes from its protein sequence. In this method the SVM is trained with proteins whose gene expression data is known in a given condition. Then trained SVM is used to predict the gene expression of other proteins of the same organism in the same condition. A correlation coefficient r = 0.70 was obtained between predicted and experimentally determined expression of genes, which improves from r = 0.70 to 0.72 when dipeptide composition was used instead of residue composition. The method was evaluated using 5-fold cross validation test. We also demonstrate that amino acid composition information along with gene expression data can be used for improving the function classification of proteins. CONCLUSION: There is a correlation between gene expression and amino acid composition that can be used to predict the expression level of genes up to a certain extent. A web server based on the above strategy has been developed for calculating the correlation between amino acid composition and gene expression and prediction of expression level . This server will allow users to study the evolution from expression data. BioMed Central 2005-03-17 /pmc/articles/PMC1083413/ /pubmed/15773999 http://dx.doi.org/10.1186/1471-2105-6-59 Text en Copyright © 2005 Raghava and Han; licensee BioMed Central Ltd.
spellingShingle Methodology Article
Raghava, Gajendra PS
Han, Joon H
Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein
title Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein
title_full Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein
title_fullStr Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein
title_full_unstemmed Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein
title_short Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein
title_sort correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1083413/
https://www.ncbi.nlm.nih.gov/pubmed/15773999
http://dx.doi.org/10.1186/1471-2105-6-59
work_keys_str_mv AT raghavagajendraps correlationandpredictionofgeneexpressionlevelfromaminoacidanddipeptidecompositionofitsprotein
AT hanjoonh correlationandpredictionofgeneexpressionlevelfromaminoacidanddipeptidecompositionofitsprotein