Cargando…

Genome comparison using Gene Ontology (GO) with statistical testing

BACKGROUND: Automated comparison of complete sets of genes encoded in two genomes can provide insight on the genetic basis of differences in biological traits between species. Gene ontology (GO) is used as a common vocabulary to annotate genes for comparison. Current approaches calculate the fold of...

Descripción completa

Detalles Bibliográficos
Autores principales: Cai, Zhaotao, Mao, Xizeng, Li, Songgang, Wei, Liping
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1569881/
https://www.ncbi.nlm.nih.gov/pubmed/16901353
http://dx.doi.org/10.1186/1471-2105-7-374
_version_ 1782130228766703616
author Cai, Zhaotao
Mao, Xizeng
Li, Songgang
Wei, Liping
author_facet Cai, Zhaotao
Mao, Xizeng
Li, Songgang
Wei, Liping
author_sort Cai, Zhaotao
collection PubMed
description BACKGROUND: Automated comparison of complete sets of genes encoded in two genomes can provide insight on the genetic basis of differences in biological traits between species. Gene ontology (GO) is used as a common vocabulary to annotate genes for comparison. Current approaches calculate the fold of unweighted or weighted differences between two species at the high-level GO functional categories. However, to ensure the reliability of the differences detected, it is important to evaluate their statistical significance. It is also useful to search for differences at all levels of GO. RESULTS: We propose a statistical approach to find reliable differences between the complete sets of genes encoded in two genomes at all levels of GO. The genes are first assigned GO terms from BLAST searches against genes with known GO assignments, and for each GO term the abundance of genes in the two genomes is compared using a chi-squared test followed by false discovery rate (FDR) correction. We applied this method to find statistically significant differences between two cyanobacteria, Synechocystis sp. PCC6803 and Anabaena sp. PCC7120. We then studied how the set of identified differences vary when different BLAST cutoffs are used. We also studied how the results vary when only subsets of the genes were used in the comparison of human vs. mouse and that of Saccharomyces cerevisiae vs. Schizosaccharomyces pombe. CONCLUSION: There is a surprising lack of statistical approaches for comparing complete genomes at all levels of GO. With the rapid increase of the number of sequenced genomes, we hope that the approach we proposed and tested can make valuable contribution to comparative genomics.
format Text
id pubmed-1569881
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15698812006-09-16 Genome comparison using Gene Ontology (GO) with statistical testing Cai, Zhaotao Mao, Xizeng Li, Songgang Wei, Liping BMC Bioinformatics Methodology Article BACKGROUND: Automated comparison of complete sets of genes encoded in two genomes can provide insight on the genetic basis of differences in biological traits between species. Gene ontology (GO) is used as a common vocabulary to annotate genes for comparison. Current approaches calculate the fold of unweighted or weighted differences between two species at the high-level GO functional categories. However, to ensure the reliability of the differences detected, it is important to evaluate their statistical significance. It is also useful to search for differences at all levels of GO. RESULTS: We propose a statistical approach to find reliable differences between the complete sets of genes encoded in two genomes at all levels of GO. The genes are first assigned GO terms from BLAST searches against genes with known GO assignments, and for each GO term the abundance of genes in the two genomes is compared using a chi-squared test followed by false discovery rate (FDR) correction. We applied this method to find statistically significant differences between two cyanobacteria, Synechocystis sp. PCC6803 and Anabaena sp. PCC7120. We then studied how the set of identified differences vary when different BLAST cutoffs are used. We also studied how the results vary when only subsets of the genes were used in the comparison of human vs. mouse and that of Saccharomyces cerevisiae vs. Schizosaccharomyces pombe. CONCLUSION: There is a surprising lack of statistical approaches for comparing complete genomes at all levels of GO. With the rapid increase of the number of sequenced genomes, we hope that the approach we proposed and tested can make valuable contribution to comparative genomics. BioMed Central 2006-08-11 /pmc/articles/PMC1569881/ /pubmed/16901353 http://dx.doi.org/10.1186/1471-2105-7-374 Text en Copyright © 2006 Cai et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Cai, Zhaotao
Mao, Xizeng
Li, Songgang
Wei, Liping
Genome comparison using Gene Ontology (GO) with statistical testing
title Genome comparison using Gene Ontology (GO) with statistical testing
title_full Genome comparison using Gene Ontology (GO) with statistical testing
title_fullStr Genome comparison using Gene Ontology (GO) with statistical testing
title_full_unstemmed Genome comparison using Gene Ontology (GO) with statistical testing
title_short Genome comparison using Gene Ontology (GO) with statistical testing
title_sort genome comparison using gene ontology (go) with statistical testing
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1569881/
https://www.ncbi.nlm.nih.gov/pubmed/16901353
http://dx.doi.org/10.1186/1471-2105-7-374
work_keys_str_mv AT caizhaotao genomecomparisonusinggeneontologygowithstatisticaltesting
AT maoxizeng genomecomparisonusinggeneontologygowithstatisticaltesting
AT lisonggang genomecomparisonusinggeneontologygowithstatisticaltesting
AT weiliping genomecomparisonusinggeneontologygowithstatisticaltesting