Cargando…
Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line
Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5041471/ https://www.ncbi.nlm.nih.gov/pubmed/27353327 http://dx.doi.org/10.1093/nar/gkw578 |
_version_ | 1782456421040783360 |
---|---|
author | Blayney, Jaine K. Davison, Timothy McCabe, Nuala Walker, Steven Keating, Karen Delaney, Thomas Greenan, Caroline Williams, Alistair R. McCluggage, W. Glenn Capes-Davis, Amanda Harkin, D. Paul Gourley, Charlie Kennedy, Richard D. |
author_facet | Blayney, Jaine K. Davison, Timothy McCabe, Nuala Walker, Steven Keating, Karen Delaney, Thomas Greenan, Caroline Williams, Alistair R. McCluggage, W. Glenn Capes-Davis, Amanda Harkin, D. Paul Gourley, Charlie Kennedy, Richard D. |
author_sort | Blayney, Jaine K. |
collection | PubMed |
description | Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package. |
format | Online Article Text |
id | pubmed-5041471 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-50414712016-09-30 Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line Blayney, Jaine K. Davison, Timothy McCabe, Nuala Walker, Steven Keating, Karen Delaney, Thomas Greenan, Caroline Williams, Alistair R. McCluggage, W. Glenn Capes-Davis, Amanda Harkin, D. Paul Gourley, Charlie Kennedy, Richard D. Nucleic Acids Res Methods Online Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package. Oxford University Press 2016-09-30 2016-06-28 /pmc/articles/PMC5041471/ /pubmed/27353327 http://dx.doi.org/10.1093/nar/gkw578 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Online Blayney, Jaine K. Davison, Timothy McCabe, Nuala Walker, Steven Keating, Karen Delaney, Thomas Greenan, Caroline Williams, Alistair R. McCluggage, W. Glenn Capes-Davis, Amanda Harkin, D. Paul Gourley, Charlie Kennedy, Richard D. Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line |
title | Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line |
title_full | Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line |
title_fullStr | Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line |
title_full_unstemmed | Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line |
title_short | Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line |
title_sort | prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5041471/ https://www.ncbi.nlm.nih.gov/pubmed/27353327 http://dx.doi.org/10.1093/nar/gkw578 |
work_keys_str_mv | AT blayneyjainek priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline AT davisontimothy priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline AT mccabenuala priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline AT walkersteven priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline AT keatingkaren priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline AT delaneythomas priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline AT greenancaroline priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline AT williamsalistairr priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline AT mccluggagewglenn priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline AT capesdavisamanda priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline AT harkindpaul priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline AT gourleycharlie priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline AT kennedyrichardd priorknowledgetransferacrosstranscriptionaldatasetsandtechnologiesusingcompositionalstatisticsyieldsnewmislabelledovariancellline |