Cargando…

GOParGenPy: a high throughput method to generate Gene Ontology data matrices

BACKGROUND: Gene Ontology (GO) is a popular standard in the annotation of gene products and provides information related to genes across all species. The structure of GO is dynamic and is updated on a daily basis. However, the popular existing methods use outdated versions of GO. Moreover, these too...

Descripción completa

Detalles Bibliográficos
Autores principales: Kumar, Ajay Anand, Holm, Liisa, Toronen, Petri
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3750654/
https://www.ncbi.nlm.nih.gov/pubmed/23927037
http://dx.doi.org/10.1186/1471-2105-14-242
_version_ 1782281462864674816
author Kumar, Ajay Anand
Holm, Liisa
Toronen, Petri
author_facet Kumar, Ajay Anand
Holm, Liisa
Toronen, Petri
author_sort Kumar, Ajay Anand
collection PubMed
description BACKGROUND: Gene Ontology (GO) is a popular standard in the annotation of gene products and provides information related to genes across all species. The structure of GO is dynamic and is updated on a daily basis. However, the popular existing methods use outdated versions of GO. Moreover, these tools are slow to process large datasets consisting of more than 20,000 genes. RESULTS: We have developed GOParGenPy, a platform independent software tool to generate the binary data matrix showing the GO class membership, including parental classes, of a set of GO annotated genes. GOParGenPy is at least an order of magnitude faster than popular tools for Gene Ontology analysis and it can handle larger datasets than the existing tools. It can use any available version of the GO structure and allows the user to select the source of GO annotation. GO structure selection is critical for analysis, as we show that GO classes have rapid turnover between different GO structure releases. CONCLUSIONS: GOParGenPy is an easy to use software tool which can generate sparse or full binary matrices from GO annotated gene sets. The obtained binary matrix can then be used with any analysis environment and with any analysis methods.
format Online
Article
Text
id pubmed-3750654
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-37506542013-08-24 GOParGenPy: a high throughput method to generate Gene Ontology data matrices Kumar, Ajay Anand Holm, Liisa Toronen, Petri BMC Bioinformatics Software BACKGROUND: Gene Ontology (GO) is a popular standard in the annotation of gene products and provides information related to genes across all species. The structure of GO is dynamic and is updated on a daily basis. However, the popular existing methods use outdated versions of GO. Moreover, these tools are slow to process large datasets consisting of more than 20,000 genes. RESULTS: We have developed GOParGenPy, a platform independent software tool to generate the binary data matrix showing the GO class membership, including parental classes, of a set of GO annotated genes. GOParGenPy is at least an order of magnitude faster than popular tools for Gene Ontology analysis and it can handle larger datasets than the existing tools. It can use any available version of the GO structure and allows the user to select the source of GO annotation. GO structure selection is critical for analysis, as we show that GO classes have rapid turnover between different GO structure releases. CONCLUSIONS: GOParGenPy is an easy to use software tool which can generate sparse or full binary matrices from GO annotated gene sets. The obtained binary matrix can then be used with any analysis environment and with any analysis methods. BioMed Central 2013-08-08 /pmc/articles/PMC3750654/ /pubmed/23927037 http://dx.doi.org/10.1186/1471-2105-14-242 Text en Copyright © 2013 Kumar et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Kumar, Ajay Anand
Holm, Liisa
Toronen, Petri
GOParGenPy: a high throughput method to generate Gene Ontology data matrices
title GOParGenPy: a high throughput method to generate Gene Ontology data matrices
title_full GOParGenPy: a high throughput method to generate Gene Ontology data matrices
title_fullStr GOParGenPy: a high throughput method to generate Gene Ontology data matrices
title_full_unstemmed GOParGenPy: a high throughput method to generate Gene Ontology data matrices
title_short GOParGenPy: a high throughput method to generate Gene Ontology data matrices
title_sort gopargenpy: a high throughput method to generate gene ontology data matrices
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3750654/
https://www.ncbi.nlm.nih.gov/pubmed/23927037
http://dx.doi.org/10.1186/1471-2105-14-242
work_keys_str_mv AT kumarajayanand gopargenpyahighthroughputmethodtogenerategeneontologydatamatrices
AT holmliisa gopargenpyahighthroughputmethodtogenerategeneontologydatamatrices
AT toronenpetri gopargenpyahighthroughputmethodtogenerategeneontologydatamatrices