Cargando…

Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line

Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to...

Descripción completa

Detalles Bibliográficos
Autores principales: Blayney, Jaine K., Davison, Timothy, McCabe, Nuala, Walker, Steven, Keating, Karen, Delaney, Thomas, Greenan, Caroline, Williams, Alistair R., McCluggage, W. Glenn, Capes-Davis, Amanda, Harkin, D. Paul, Gourley, Charlie, Kennedy, Richard D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5041471/
https://www.ncbi.nlm.nih.gov/pubmed/27353327
http://dx.doi.org/10.1093/nar/gkw578
_version_ 1782456421040783360
author Blayney, Jaine K.
Davison, Timothy
McCabe, Nuala
Walker, Steven
Keating, Karen
Delaney, Thomas
Greenan, Caroline
Williams, Alistair R.
McCluggage, W. Glenn
Capes-Davis, Amanda
Harkin, D. Paul
Gourley, Charlie
Kennedy, Richard D.
author_facet Blayney, Jaine K.
Davison, Timothy
McCabe, Nuala
Walker, Steven
Keating, Karen
Delaney, Thomas
Greenan, Caroline
Williams, Alistair R.
McCluggage, W. Glenn
Capes-Davis, Amanda
Harkin, D. Paul
Gourley, Charlie
Kennedy, Richard D.
author_sort Blayney, Jaine K.
collection PubMed
description Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package.
format Online
Article
Text
id pubmed-5041471
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-50414712016-09-30 Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line Blayney, Jaine K. Davison, Timothy McCabe, Nuala Walker, Steven Keating, Karen Delaney, Thomas Greenan, Caroline Williams, Alistair R. McCluggage, W. Glenn Capes-Davis, Amanda Harkin, D. Paul Gourley, Charlie Kennedy, Richard D. Nucleic Acids Res Methods Online Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package. Oxford University Press 2016-09-30 2016-06-28 /pmc/articles/PMC5041471/ /pubmed/27353327 http://dx.doi.org/10.1093/nar/gkw578 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Blayney, Jaine K.
Davison, Timothy
McCabe, Nuala
Walker, Steven
Keating, Karen
Delaney, Thomas
Greenan, Caroline
Williams, Alistair R.
McCluggage, W. Glenn
Capes-Davis, Amanda
Harkin, D. Paul
Gourley, Charlie
Kennedy, Richard D.
Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line
title Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line
title_full Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line
title_fullStr Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line
title_full_unstemmed Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line
title_short Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line
title_sort prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5041471/
https://www.ncbi.nlm.nih.gov/pubmed/27353327
http://dx.doi.org/10.1093/nar/gkw578
work_keys_str_mv AT blayneyjainek priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline
AT davisontimothy priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline
AT mccabenuala priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline
AT walkersteven priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline
AT keatingkaren priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline
AT delaneythomas priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline
AT greenancaroline priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline
AT williamsalistairr priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline
AT mccluggagewglenn priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline
AT capesdavisamanda priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline
AT harkindpaul priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline
AT gourleycharlie priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline
AT kennedyrichardd priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline