Cargando…

Seq-ing improved gene expression estimates from microarrays using machine learning

BACKGROUND: Quantifying gene expression by RNA-Seq has several advantages over microarrays, including greater dynamic range and gene expression estimates on an absolute, rather than a relative scale. Nevertheless, microarrays remain in widespread use, demonstrated by the ever-growing numbers of samp...

Descripción completa

Detalles Bibliográficos
Autores principales: Korir, Paul K., Geeleher, Paul, Seoighe, Cathal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4559919/
https://www.ncbi.nlm.nih.gov/pubmed/26338512
http://dx.doi.org/10.1186/s12859-015-0712-z
_version_ 1782388857318146048
author Korir, Paul K.
Geeleher, Paul
Seoighe, Cathal
author_facet Korir, Paul K.
Geeleher, Paul
Seoighe, Cathal
author_sort Korir, Paul K.
collection PubMed
description BACKGROUND: Quantifying gene expression by RNA-Seq has several advantages over microarrays, including greater dynamic range and gene expression estimates on an absolute, rather than a relative scale. Nevertheless, microarrays remain in widespread use, demonstrated by the ever-growing numbers of samples deposited in public repositories. RESULTS: We propose a novel approach to microarray analysis that attains many of the advantages of RNA-Seq. This method, called Machine Learning of Transcript Expression (MaLTE), leverages samples for which both microarray and RNA-Seq data are available, using a Random Forest to learn the relationship between the fluorescence intensity of sets of microarray probes and RNA-Seq transcript expression estimates. We trained MaLTE on data from the Genotype-Tissue Expression (GTEx) project, consisting of Affymetrix gene arrays and RNA-Seq from over 700 samples across a broad range of human tissues. CONCLUSION: This approach can be used to accurately estimate absolute expression levels from microarray data, at both gene and transcript level, which has not previously been possible. This methodology will facilitate re-analysis of archived microarray data and broaden the utility of the vast quantities of data still being generated. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0712-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4559919
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45599192015-09-05 Seq-ing improved gene expression estimates from microarrays using machine learning Korir, Paul K. Geeleher, Paul Seoighe, Cathal BMC Bioinformatics Research Article BACKGROUND: Quantifying gene expression by RNA-Seq has several advantages over microarrays, including greater dynamic range and gene expression estimates on an absolute, rather than a relative scale. Nevertheless, microarrays remain in widespread use, demonstrated by the ever-growing numbers of samples deposited in public repositories. RESULTS: We propose a novel approach to microarray analysis that attains many of the advantages of RNA-Seq. This method, called Machine Learning of Transcript Expression (MaLTE), leverages samples for which both microarray and RNA-Seq data are available, using a Random Forest to learn the relationship between the fluorescence intensity of sets of microarray probes and RNA-Seq transcript expression estimates. We trained MaLTE on data from the Genotype-Tissue Expression (GTEx) project, consisting of Affymetrix gene arrays and RNA-Seq from over 700 samples across a broad range of human tissues. CONCLUSION: This approach can be used to accurately estimate absolute expression levels from microarray data, at both gene and transcript level, which has not previously been possible. This methodology will facilitate re-analysis of archived microarray data and broaden the utility of the vast quantities of data still being generated. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0712-z) contains supplementary material, which is available to authorized users. BioMed Central 2015-09-04 /pmc/articles/PMC4559919/ /pubmed/26338512 http://dx.doi.org/10.1186/s12859-015-0712-z Text en © Korir et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Korir, Paul K.
Geeleher, Paul
Seoighe, Cathal
Seq-ing improved gene expression estimates from microarrays using machine learning
title Seq-ing improved gene expression estimates from microarrays using machine learning
title_full Seq-ing improved gene expression estimates from microarrays using machine learning
title_fullStr Seq-ing improved gene expression estimates from microarrays using machine learning
title_full_unstemmed Seq-ing improved gene expression estimates from microarrays using machine learning
title_short Seq-ing improved gene expression estimates from microarrays using machine learning
title_sort seq-ing improved gene expression estimates from microarrays using machine learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4559919/
https://www.ncbi.nlm.nih.gov/pubmed/26338512
http://dx.doi.org/10.1186/s12859-015-0712-z
work_keys_str_mv AT korirpaulk seqingimprovedgeneexpressionestimatesfrommicroarraysusingmachinelearning
AT geeleherpaul seqingimprovedgeneexpressionestimatesfrommicroarraysusingmachinelearning
AT seoighecathal seqingimprovedgeneexpressionestimatesfrommicroarraysusingmachinelearning