Cargando…
Generation of Gene Ontology benchmark datasets with various types of positive signal
BACKGROUND: The analysis of over-represented functional classes in a list of genes is one of the most essential bioinformatics research topics. Typical examples of such lists are the differentially expressed genes from transcriptional analysis which need to be linked to functional information repres...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2762998/ https://www.ncbi.nlm.nih.gov/pubmed/19811632 http://dx.doi.org/10.1186/1471-2105-10-319 |
_version_ | 1782172973066616832 |
---|---|
author | Törönen, Petri Pehkonen, Petri Holm, Liisa |
author_facet | Törönen, Petri Pehkonen, Petri Holm, Liisa |
author_sort | Törönen, Petri |
collection | PubMed |
description | BACKGROUND: The analysis of over-represented functional classes in a list of genes is one of the most essential bioinformatics research topics. Typical examples of such lists are the differentially expressed genes from transcriptional analysis which need to be linked to functional information represented in the Gene Ontology (GO). Despite the importance of this procedure, there is a little work on consistent evaluation of various GO analysis methods. Especially, there is no literature on creating benchmark datasets for GO analysis tools. RESULTS: We propose a methodology for the evaluation of GO analysis tools, which consists of creating gene lists with a selected signal level and a selected number of independent over-represented classes. The methodology starts with a real life GO data matrix, and therefore the generated datasets have similar features to real positive datasets. The user can select the signal level for over-representation, the number of independent positive classes in the dataset, and the size of the final gene list. We present the use of the effective number and various normalizations while embedding the signal to a selected class or classes and the use of binary correlation to ensure that the selected signal classes are independent with each other. The usefulness of generated datasets is demonstrated by comparing different GO class ranking and GO clustering methods. CONCLUSION: The presented methods aid the development and evaluation of GO analysis methods as they enable thorough testing with different signal types and different signal levels. As an example, our comparisons reveal clear differences between compared GO clustering and GO de-correlation methods. The implementation is coded in Matlab and is freely available at the dedicated website . |
format | Text |
id | pubmed-2762998 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-27629982009-10-17 Generation of Gene Ontology benchmark datasets with various types of positive signal Törönen, Petri Pehkonen, Petri Holm, Liisa BMC Bioinformatics Methodology Article BACKGROUND: The analysis of over-represented functional classes in a list of genes is one of the most essential bioinformatics research topics. Typical examples of such lists are the differentially expressed genes from transcriptional analysis which need to be linked to functional information represented in the Gene Ontology (GO). Despite the importance of this procedure, there is a little work on consistent evaluation of various GO analysis methods. Especially, there is no literature on creating benchmark datasets for GO analysis tools. RESULTS: We propose a methodology for the evaluation of GO analysis tools, which consists of creating gene lists with a selected signal level and a selected number of independent over-represented classes. The methodology starts with a real life GO data matrix, and therefore the generated datasets have similar features to real positive datasets. The user can select the signal level for over-representation, the number of independent positive classes in the dataset, and the size of the final gene list. We present the use of the effective number and various normalizations while embedding the signal to a selected class or classes and the use of binary correlation to ensure that the selected signal classes are independent with each other. The usefulness of generated datasets is demonstrated by comparing different GO class ranking and GO clustering methods. CONCLUSION: The presented methods aid the development and evaluation of GO analysis methods as they enable thorough testing with different signal types and different signal levels. As an example, our comparisons reveal clear differences between compared GO clustering and GO de-correlation methods. The implementation is coded in Matlab and is freely available at the dedicated website . BioMed Central 2009-10-07 /pmc/articles/PMC2762998/ /pubmed/19811632 http://dx.doi.org/10.1186/1471-2105-10-319 Text en Copyright © 2009 Törönen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Törönen, Petri Pehkonen, Petri Holm, Liisa Generation of Gene Ontology benchmark datasets with various types of positive signal |
title | Generation of Gene Ontology benchmark datasets with various types of positive signal |
title_full | Generation of Gene Ontology benchmark datasets with various types of positive signal |
title_fullStr | Generation of Gene Ontology benchmark datasets with various types of positive signal |
title_full_unstemmed | Generation of Gene Ontology benchmark datasets with various types of positive signal |
title_short | Generation of Gene Ontology benchmark datasets with various types of positive signal |
title_sort | generation of gene ontology benchmark datasets with various types of positive signal |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2762998/ https://www.ncbi.nlm.nih.gov/pubmed/19811632 http://dx.doi.org/10.1186/1471-2105-10-319 |
work_keys_str_mv | AT toronenpetri generationofgeneontologybenchmarkdatasetswithvarioustypesofpositivesignal AT pehkonenpetri generationofgeneontologybenchmarkdatasetswithvarioustypesofpositivesignal AT holmliisa generationofgeneontologybenchmarkdatasetswithvarioustypesofpositivesignal |