Cargando…

FANCY: fast estimation of privacy risk in functional genomics data

MOTIVATION: Functional genomics data are becoming clinically actionable, raising privacy concerns. However, quantifying privacy leakage via genotyping is difficult due to the heterogeneous nature of sequencing techniques. Thus, we present FANCY, a tool that rapidly estimates the number of leaking va...

Descripción completa

Detalles Bibliográficos
Autores principales: Gürsoy, Gamze, Brannon, Charlotte M, Navarro, Fabio C P, Gerstein, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7850135/
https://www.ncbi.nlm.nih.gov/pubmed/32726397
http://dx.doi.org/10.1093/bioinformatics/btaa661
_version_ 1783645408688340992
author Gürsoy, Gamze
Brannon, Charlotte M
Navarro, Fabio C P
Gerstein, Mark
author_facet Gürsoy, Gamze
Brannon, Charlotte M
Navarro, Fabio C P
Gerstein, Mark
author_sort Gürsoy, Gamze
collection PubMed
description MOTIVATION: Functional genomics data are becoming clinically actionable, raising privacy concerns. However, quantifying privacy leakage via genotyping is difficult due to the heterogeneous nature of sequencing techniques. Thus, we present FANCY, a tool that rapidly estimates the number of leaking variants from raw RNA-Seq, ATAC-Seq and ChIP-Seq reads, without explicit genotyping. FANCY employs supervised regression using overall sequencing statistics as features and provides an estimate of the overall privacy risk before data release. RESULTS: FANCY can predict the cumulative number of leaking SNVs with an average 0.95 R(2) for all independent test sets. We realize the importance of accurate prediction when the number of leaked variants is low. Thus, we develop a special version of the model, which can make predictions with higher accuracy when the number of leaking variants is low. AVAILABILITY AND IMPLEMENTATION: A python and MATLAB implementation of FANCY, as well as custom scripts to generate the features can be found at https://github.com/gersteinlab/FANCY. We also provide jupyter notebooks so that users can optimize the parameters in the regression model based on their own data. An easy-to-use webserver that takes inputs and displays results can be found at fancy.gersteinlab.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7850135
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-78501352021-02-03 FANCY: fast estimation of privacy risk in functional genomics data Gürsoy, Gamze Brannon, Charlotte M Navarro, Fabio C P Gerstein, Mark Bioinformatics Original Papers MOTIVATION: Functional genomics data are becoming clinically actionable, raising privacy concerns. However, quantifying privacy leakage via genotyping is difficult due to the heterogeneous nature of sequencing techniques. Thus, we present FANCY, a tool that rapidly estimates the number of leaking variants from raw RNA-Seq, ATAC-Seq and ChIP-Seq reads, without explicit genotyping. FANCY employs supervised regression using overall sequencing statistics as features and provides an estimate of the overall privacy risk before data release. RESULTS: FANCY can predict the cumulative number of leaking SNVs with an average 0.95 R(2) for all independent test sets. We realize the importance of accurate prediction when the number of leaked variants is low. Thus, we develop a special version of the model, which can make predictions with higher accuracy when the number of leaking variants is low. AVAILABILITY AND IMPLEMENTATION: A python and MATLAB implementation of FANCY, as well as custom scripts to generate the features can be found at https://github.com/gersteinlab/FANCY. We also provide jupyter notebooks so that users can optimize the parameters in the regression model based on their own data. An easy-to-use webserver that takes inputs and displays results can be found at fancy.gersteinlab.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-07-29 /pmc/articles/PMC7850135/ /pubmed/32726397 http://dx.doi.org/10.1093/bioinformatics/btaa661 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Gürsoy, Gamze
Brannon, Charlotte M
Navarro, Fabio C P
Gerstein, Mark
FANCY: fast estimation of privacy risk in functional genomics data
title FANCY: fast estimation of privacy risk in functional genomics data
title_full FANCY: fast estimation of privacy risk in functional genomics data
title_fullStr FANCY: fast estimation of privacy risk in functional genomics data
title_full_unstemmed FANCY: fast estimation of privacy risk in functional genomics data
title_short FANCY: fast estimation of privacy risk in functional genomics data
title_sort fancy: fast estimation of privacy risk in functional genomics data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7850135/
https://www.ncbi.nlm.nih.gov/pubmed/32726397
http://dx.doi.org/10.1093/bioinformatics/btaa661
work_keys_str_mv AT gursoygamze fancyfastestimationofprivacyriskinfunctionalgenomicsdata
AT brannoncharlottem fancyfastestimationofprivacyriskinfunctionalgenomicsdata
AT navarrofabiocp fancyfastestimationofprivacyriskinfunctionalgenomicsdata
AT gersteinmark fancyfastestimationofprivacyriskinfunctionalgenomicsdata