Cargando…

Generation of Gene Ontology benchmark datasets with various types of positive signal

BACKGROUND: The analysis of over-represented functional classes in a list of genes is one of the most essential bioinformatics research topics. Typical examples of such lists are the differentially expressed genes from transcriptional analysis which need to be linked to functional information repres...

Descripción completa

Detalles Bibliográficos
Autores principales: Törönen, Petri, Pehkonen, Petri, Holm, Liisa
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2762998/
https://www.ncbi.nlm.nih.gov/pubmed/19811632
http://dx.doi.org/10.1186/1471-2105-10-319
_version_ 1782172973066616832
author Törönen, Petri
Pehkonen, Petri
Holm, Liisa
author_facet Törönen, Petri
Pehkonen, Petri
Holm, Liisa
author_sort Törönen, Petri
collection PubMed
description BACKGROUND: The analysis of over-represented functional classes in a list of genes is one of the most essential bioinformatics research topics. Typical examples of such lists are the differentially expressed genes from transcriptional analysis which need to be linked to functional information represented in the Gene Ontology (GO). Despite the importance of this procedure, there is a little work on consistent evaluation of various GO analysis methods. Especially, there is no literature on creating benchmark datasets for GO analysis tools. RESULTS: We propose a methodology for the evaluation of GO analysis tools, which consists of creating gene lists with a selected signal level and a selected number of independent over-represented classes. The methodology starts with a real life GO data matrix, and therefore the generated datasets have similar features to real positive datasets. The user can select the signal level for over-representation, the number of independent positive classes in the dataset, and the size of the final gene list. We present the use of the effective number and various normalizations while embedding the signal to a selected class or classes and the use of binary correlation to ensure that the selected signal classes are independent with each other. The usefulness of generated datasets is demonstrated by comparing different GO class ranking and GO clustering methods. CONCLUSION: The presented methods aid the development and evaluation of GO analysis methods as they enable thorough testing with different signal types and different signal levels. As an example, our comparisons reveal clear differences between compared GO clustering and GO de-correlation methods. The implementation is coded in Matlab and is freely available at the dedicated website .
format Text
id pubmed-2762998
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27629982009-10-17 Generation of Gene Ontology benchmark datasets with various types of positive signal Törönen, Petri Pehkonen, Petri Holm, Liisa BMC Bioinformatics Methodology Article BACKGROUND: The analysis of over-represented functional classes in a list of genes is one of the most essential bioinformatics research topics. Typical examples of such lists are the differentially expressed genes from transcriptional analysis which need to be linked to functional information represented in the Gene Ontology (GO). Despite the importance of this procedure, there is a little work on consistent evaluation of various GO analysis methods. Especially, there is no literature on creating benchmark datasets for GO analysis tools. RESULTS: We propose a methodology for the evaluation of GO analysis tools, which consists of creating gene lists with a selected signal level and a selected number of independent over-represented classes. The methodology starts with a real life GO data matrix, and therefore the generated datasets have similar features to real positive datasets. The user can select the signal level for over-representation, the number of independent positive classes in the dataset, and the size of the final gene list. We present the use of the effective number and various normalizations while embedding the signal to a selected class or classes and the use of binary correlation to ensure that the selected signal classes are independent with each other. The usefulness of generated datasets is demonstrated by comparing different GO class ranking and GO clustering methods. CONCLUSION: The presented methods aid the development and evaluation of GO analysis methods as they enable thorough testing with different signal types and different signal levels. As an example, our comparisons reveal clear differences between compared GO clustering and GO de-correlation methods. The implementation is coded in Matlab and is freely available at the dedicated website . BioMed Central 2009-10-07 /pmc/articles/PMC2762998/ /pubmed/19811632 http://dx.doi.org/10.1186/1471-2105-10-319 Text en Copyright © 2009 Törönen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Törönen, Petri
Pehkonen, Petri
Holm, Liisa
Generation of Gene Ontology benchmark datasets with various types of positive signal
title Generation of Gene Ontology benchmark datasets with various types of positive signal
title_full Generation of Gene Ontology benchmark datasets with various types of positive signal
title_fullStr Generation of Gene Ontology benchmark datasets with various types of positive signal
title_full_unstemmed Generation of Gene Ontology benchmark datasets with various types of positive signal
title_short Generation of Gene Ontology benchmark datasets with various types of positive signal
title_sort generation of gene ontology benchmark datasets with various types of positive signal
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2762998/
https://www.ncbi.nlm.nih.gov/pubmed/19811632
http://dx.doi.org/10.1186/1471-2105-10-319
work_keys_str_mv AT toronenpetri generationofgeneontologybenchmarkdatasetswithvarioustypesofpositivesignal
AT pehkonenpetri generationofgeneontologybenchmarkdatasetswithvarioustypesofpositivesignal
AT holmliisa generationofgeneontologybenchmarkdatasetswithvarioustypesofpositivesignal