Cargando…

GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data

BACKGROUND: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used t...

Descripción completa

Detalles Bibliográficos
Autores principales: Rue-Albrecht, Kévin, McGettigan, Paul A., Hernández, Belinda, Nalpas, Nicolas C., Magee, David A., Parnell, Andrew C., Gordon, Stephen V., MacHugh, David E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4788925/
https://www.ncbi.nlm.nih.gov/pubmed/26968614
http://dx.doi.org/10.1186/s12859-016-0971-3
_version_ 1782420792579981312
author Rue-Albrecht, Kévin
McGettigan, Paul A.
Hernández, Belinda
Nalpas, Nicolas C.
Magee, David A.
Parnell, Andrew C.
Gordon, Stephen V.
MacHugh, David E.
author_facet Rue-Albrecht, Kévin
McGettigan, Paul A.
Hernández, Belinda
Nalpas, Nicolas C.
Magee, David A.
Parnell, Andrew C.
Gordon, Stephen V.
MacHugh, David E.
author_sort Rue-Albrecht, Kévin
collection PubMed
description BACKGROUND: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. RESULTS: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. CONCLUSIONS: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0971-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4788925
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47889252016-03-13 GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data Rue-Albrecht, Kévin McGettigan, Paul A. Hernández, Belinda Nalpas, Nicolas C. Magee, David A. Parnell, Andrew C. Gordon, Stephen V. MacHugh, David E. BMC Bioinformatics Software BACKGROUND: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. RESULTS: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. CONCLUSIONS: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0971-3) contains supplementary material, which is available to authorized users. BioMed Central 2016-03-11 /pmc/articles/PMC4788925/ /pubmed/26968614 http://dx.doi.org/10.1186/s12859-016-0971-3 Text en © Rue-Albrecht et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Rue-Albrecht, Kévin
McGettigan, Paul A.
Hernández, Belinda
Nalpas, Nicolas C.
Magee, David A.
Parnell, Andrew C.
Gordon, Stephen V.
MacHugh, David E.
GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data
title GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data
title_full GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data
title_fullStr GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data
title_full_unstemmed GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data
title_short GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data
title_sort goexpress: an r/bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4788925/
https://www.ncbi.nlm.nih.gov/pubmed/26968614
http://dx.doi.org/10.1186/s12859-016-0971-3
work_keys_str_mv AT ruealbrechtkevin goexpressanrbioconductorpackagefortheidentificationandvisualisationofrobustgeneontologysignaturesthroughsupervisedlearningofgeneexpressiondata
AT mcgettiganpaula goexpressanrbioconductorpackagefortheidentificationandvisualisationofrobustgeneontologysignaturesthroughsupervisedlearningofgeneexpressiondata
AT hernandezbelinda goexpressanrbioconductorpackagefortheidentificationandvisualisationofrobustgeneontologysignaturesthroughsupervisedlearningofgeneexpressiondata
AT nalpasnicolasc goexpressanrbioconductorpackagefortheidentificationandvisualisationofrobustgeneontologysignaturesthroughsupervisedlearningofgeneexpressiondata
AT mageedavida goexpressanrbioconductorpackagefortheidentificationandvisualisationofrobustgeneontologysignaturesthroughsupervisedlearningofgeneexpressiondata
AT parnellandrewc goexpressanrbioconductorpackagefortheidentificationandvisualisationofrobustgeneontologysignaturesthroughsupervisedlearningofgeneexpressiondata
AT gordonstephenv goexpressanrbioconductorpackagefortheidentificationandvisualisationofrobustgeneontologysignaturesthroughsupervisedlearningofgeneexpressiondata
AT machughdavide goexpressanrbioconductorpackagefortheidentificationandvisualisationofrobustgeneontologysignaturesthroughsupervisedlearningofgeneexpressiondata