Cargando…

Privacy preserving validation for multiomic prediction models

Reproducibility of results obtained using ribonucleic acid (RNA) data across labs remains a major hurdle in cancer research. Often, molecular predictors trained on one dataset cannot be applied to another due to differences in RNA library preparation and quantification, which inhibits the validation...

Descripción completa

Detalles Bibliográficos
Autores principales: Ahmed, Talal, Carty, Mark A, Wenric, Stephane, Dry, Jonathan R, Salahudeen, Ameen A, Khan, Aly A, Lefkofsky, Eric, Stumpe, Martin C, Pelossof, Raphael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9116386/
https://www.ncbi.nlm.nih.gov/pubmed/35388408
http://dx.doi.org/10.1093/bib/bbac110
_version_ 1784710102445457408
author Ahmed, Talal
Carty, Mark A
Wenric, Stephane
Dry, Jonathan R
Salahudeen, Ameen A
Khan, Aly A
Lefkofsky, Eric
Stumpe, Martin C
Pelossof, Raphael
author_facet Ahmed, Talal
Carty, Mark A
Wenric, Stephane
Dry, Jonathan R
Salahudeen, Ameen A
Khan, Aly A
Lefkofsky, Eric
Stumpe, Martin C
Pelossof, Raphael
author_sort Ahmed, Talal
collection PubMed
description Reproducibility of results obtained using ribonucleic acid (RNA) data across labs remains a major hurdle in cancer research. Often, molecular predictors trained on one dataset cannot be applied to another due to differences in RNA library preparation and quantification, which inhibits the validation of predictors across labs. While current RNA correction algorithms reduce these differences, they require simultaneous access to patient-level data from all datasets, which necessitates the sharing of training data for predictors when sharing predictors. Here, we describe SpinAdapt, an unsupervised RNA correction algorithm that enables the transfer of molecular models without requiring access to patient-level data. It computes data corrections only via aggregate statistics of each dataset, thereby maintaining patient data privacy. Despite an inherent trade-off between privacy and performance, SpinAdapt outperforms current correction methods, like Seurat and ComBat, on publicly available cancer studies, including TCGA and ICGC. Furthermore, SpinAdapt can correct new samples, thereby enabling unbiased evaluation on validation cohorts. We expect this novel correction paradigm to enhance research reproducibility and to preserve patient privacy.
format Online
Article
Text
id pubmed-9116386
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-91163862022-05-19 Privacy preserving validation for multiomic prediction models Ahmed, Talal Carty, Mark A Wenric, Stephane Dry, Jonathan R Salahudeen, Ameen A Khan, Aly A Lefkofsky, Eric Stumpe, Martin C Pelossof, Raphael Brief Bioinform Problem Solving Protocol Reproducibility of results obtained using ribonucleic acid (RNA) data across labs remains a major hurdle in cancer research. Often, molecular predictors trained on one dataset cannot be applied to another due to differences in RNA library preparation and quantification, which inhibits the validation of predictors across labs. While current RNA correction algorithms reduce these differences, they require simultaneous access to patient-level data from all datasets, which necessitates the sharing of training data for predictors when sharing predictors. Here, we describe SpinAdapt, an unsupervised RNA correction algorithm that enables the transfer of molecular models without requiring access to patient-level data. It computes data corrections only via aggregate statistics of each dataset, thereby maintaining patient data privacy. Despite an inherent trade-off between privacy and performance, SpinAdapt outperforms current correction methods, like Seurat and ComBat, on publicly available cancer studies, including TCGA and ICGC. Furthermore, SpinAdapt can correct new samples, thereby enabling unbiased evaluation on validation cohorts. We expect this novel correction paradigm to enhance research reproducibility and to preserve patient privacy. Oxford University Press 2022-04-06 /pmc/articles/PMC9116386/ /pubmed/35388408 http://dx.doi.org/10.1093/bib/bbac110 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Problem Solving Protocol
Ahmed, Talal
Carty, Mark A
Wenric, Stephane
Dry, Jonathan R
Salahudeen, Ameen A
Khan, Aly A
Lefkofsky, Eric
Stumpe, Martin C
Pelossof, Raphael
Privacy preserving validation for multiomic prediction models
title Privacy preserving validation for multiomic prediction models
title_full Privacy preserving validation for multiomic prediction models
title_fullStr Privacy preserving validation for multiomic prediction models
title_full_unstemmed Privacy preserving validation for multiomic prediction models
title_short Privacy preserving validation for multiomic prediction models
title_sort privacy preserving validation for multiomic prediction models
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9116386/
https://www.ncbi.nlm.nih.gov/pubmed/35388408
http://dx.doi.org/10.1093/bib/bbac110
work_keys_str_mv AT ahmedtalal privacypreservingvalidationformultiomicpredictionmodels
AT cartymarka privacypreservingvalidationformultiomicpredictionmodels
AT wenricstephane privacypreservingvalidationformultiomicpredictionmodels
AT dryjonathanr privacypreservingvalidationformultiomicpredictionmodels
AT salahudeenameena privacypreservingvalidationformultiomicpredictionmodels
AT khanalya privacypreservingvalidationformultiomicpredictionmodels
AT lefkofskyeric privacypreservingvalidationformultiomicpredictionmodels
AT stumpemartinc privacypreservingvalidationformultiomicpredictionmodels
AT pelossofraphael privacypreservingvalidationformultiomicpredictionmodels