Cargando…

Predicting qualitative phenotypes from microarray data – the Eadgene pig data set

BACKGROUND: The aim of this work was to study the performances of 2 predictive statistical tools on a data set that was given to all participants of the Eadgene-SABRE Post Analyses Working Group, namely the Pig data set of Hazard et al. (2008). The data consisted of 3686 gene expressions measured on...

Descripción completa

Detalles Bibliográficos
Autores principales:	Robert-Granié, Christèle, Lê Cao, Kim-Anh, SanCristobal, Magali
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2712743/ https://www.ncbi.nlm.nih.gov/pubmed/19615113 http://dx.doi.org/10.1186/1753-6561-3-S4-S13

_version_	1782169524011794432
author	Robert-Granié, Christèle Lê Cao, Kim-Anh SanCristobal, Magali
author_facet	Robert-Granié, Christèle Lê Cao, Kim-Anh SanCristobal, Magali
author_sort	Robert-Granié, Christèle
collection	PubMed
description	BACKGROUND: The aim of this work was to study the performances of 2 predictive statistical tools on a data set that was given to all participants of the Eadgene-SABRE Post Analyses Working Group, namely the Pig data set of Hazard et al. (2008). The data consisted of 3686 gene expressions measured on 24 animals partitioned in 2 genotypes and 2 treatments. The objective was to find biomarkers that characterized the genotypes and the treatments in the whole set of genes. METHODS: We first considered the Random Forest approach that enables the selection of predictive variables. We then compared the classical Partial Least Squares regression (PLS) with a novel approach called sparse PLS, a variant of PLS that adapts lasso penalization and allows for the selection of a subset of variables. RESULTS: All methods performed well on this data set. The sparse PLS outperformed the PLS in terms of prediction performance and improved the interpretability of the results. CONCLUSION: We recommend the use of machine learning methods such as Random Forest and multivariate methods such as sparse PLS for prediction purposes. Both approaches are well adapted to transcriptomic data where the number of features is much greater than the number of individuals.
format	Text
id	pubmed-2712743
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-27127432009-07-20 Predicting qualitative phenotypes from microarray data – the Eadgene pig data set Robert-Granié, Christèle Lê Cao, Kim-Anh SanCristobal, Magali BMC Proc Research BACKGROUND: The aim of this work was to study the performances of 2 predictive statistical tools on a data set that was given to all participants of the Eadgene-SABRE Post Analyses Working Group, namely the Pig data set of Hazard et al. (2008). The data consisted of 3686 gene expressions measured on 24 animals partitioned in 2 genotypes and 2 treatments. The objective was to find biomarkers that characterized the genotypes and the treatments in the whole set of genes. METHODS: We first considered the Random Forest approach that enables the selection of predictive variables. We then compared the classical Partial Least Squares regression (PLS) with a novel approach called sparse PLS, a variant of PLS that adapts lasso penalization and allows for the selection of a subset of variables. RESULTS: All methods performed well on this data set. The sparse PLS outperformed the PLS in terms of prediction performance and improved the interpretability of the results. CONCLUSION: We recommend the use of machine learning methods such as Random Forest and multivariate methods such as sparse PLS for prediction purposes. Both approaches are well adapted to transcriptomic data where the number of features is much greater than the number of individuals. BioMed Central 2009-07-16 /pmc/articles/PMC2712743/ /pubmed/19615113 http://dx.doi.org/10.1186/1753-6561-3-S4-S13 Text en Copyright © 2009 Robert-Granié et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Robert-Granié, Christèle Lê Cao, Kim-Anh SanCristobal, Magali Predicting qualitative phenotypes from microarray data – the Eadgene pig data set
title	Predicting qualitative phenotypes from microarray data – the Eadgene pig data set
title_full	Predicting qualitative phenotypes from microarray data – the Eadgene pig data set
title_fullStr	Predicting qualitative phenotypes from microarray data – the Eadgene pig data set
title_full_unstemmed	Predicting qualitative phenotypes from microarray data – the Eadgene pig data set
title_short	Predicting qualitative phenotypes from microarray data – the Eadgene pig data set
title_sort	predicting qualitative phenotypes from microarray data – the eadgene pig data set
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2712743/ https://www.ncbi.nlm.nih.gov/pubmed/19615113 http://dx.doi.org/10.1186/1753-6561-3-S4-S13
work_keys_str_mv	AT robertgraniechristele predictingqualitativephenotypesfrommicroarraydatatheeadgenepigdataset AT lecaokimanh predictingqualitativephenotypesfrommicroarraydatatheeadgenepigdataset AT sancristobalmagali predictingqualitativephenotypesfrommicroarraydatatheeadgenepigdataset

Predicting qualitative phenotypes from microarray data – the Eadgene pig data set

Ejemplares similares