Cargando…

Creating artificial human genomes using generative neural networks

Generative models have shown breakthroughs in a wide spectrum of domains due to recent advancements in machine learning algorithms and increased computational power. Despite these impressive achievements, the ability of generative models to create realistic synthetic data is still under-exploited in...

Descripción completa

Detalles Bibliográficos
Autores principales: Yelmen, Burak, Decelle, Aurélien, Ongaro, Linda, Marnetto, Davide, Tallec, Corentin, Montinaro, Francesco, Furtlehner, Cyril, Pagani, Luca, Jay, Flora
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7861435/
https://www.ncbi.nlm.nih.gov/pubmed/33539374
http://dx.doi.org/10.1371/journal.pgen.1009303
_version_ 1783647087435448320
author Yelmen, Burak
Decelle, Aurélien
Ongaro, Linda
Marnetto, Davide
Tallec, Corentin
Montinaro, Francesco
Furtlehner, Cyril
Pagani, Luca
Jay, Flora
author_facet Yelmen, Burak
Decelle, Aurélien
Ongaro, Linda
Marnetto, Davide
Tallec, Corentin
Montinaro, Francesco
Furtlehner, Cyril
Pagani, Luca
Jay, Flora
author_sort Yelmen, Burak
collection PubMed
description Generative models have shown breakthroughs in a wide spectrum of domains due to recent advancements in machine learning algorithms and increased computational power. Despite these impressive achievements, the ability of generative models to create realistic synthetic data is still under-exploited in genetics and absent from population genetics. Yet a known limitation in the field is the reduced access to many genetic databases due to concerns about violations of individual privacy, although they would provide a rich resource for data mining and integration towards advancing genetic studies. In this study, we demonstrated that deep generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be trained to learn the complex distributions of real genomic datasets and generate novel high-quality artificial genomes (AGs) with none to little privacy loss. We show that our generated AGs replicate characteristics of the source dataset such as allele frequencies, linkage disequilibrium, pairwise haplotype distances and population structure. Moreover, they can also inherit complex features such as signals of selection. To illustrate the promising outcomes of our method, we showed that imputation quality for low frequency alleles can be improved by data augmentation to reference panels with AGs and that the RBM latent space provides a relevant encoding of the data, hence allowing further exploration of the reference dataset and features for solving supervised tasks. Generative models and AGs have the potential to become valuable assets in genetic studies by providing a rich yet compact representation of existing genomes and high-quality, easy-access and anonymous alternatives for private databases.
format Online
Article
Text
id pubmed-7861435
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-78614352021-02-12 Creating artificial human genomes using generative neural networks Yelmen, Burak Decelle, Aurélien Ongaro, Linda Marnetto, Davide Tallec, Corentin Montinaro, Francesco Furtlehner, Cyril Pagani, Luca Jay, Flora PLoS Genet Research Article Generative models have shown breakthroughs in a wide spectrum of domains due to recent advancements in machine learning algorithms and increased computational power. Despite these impressive achievements, the ability of generative models to create realistic synthetic data is still under-exploited in genetics and absent from population genetics. Yet a known limitation in the field is the reduced access to many genetic databases due to concerns about violations of individual privacy, although they would provide a rich resource for data mining and integration towards advancing genetic studies. In this study, we demonstrated that deep generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be trained to learn the complex distributions of real genomic datasets and generate novel high-quality artificial genomes (AGs) with none to little privacy loss. We show that our generated AGs replicate characteristics of the source dataset such as allele frequencies, linkage disequilibrium, pairwise haplotype distances and population structure. Moreover, they can also inherit complex features such as signals of selection. To illustrate the promising outcomes of our method, we showed that imputation quality for low frequency alleles can be improved by data augmentation to reference panels with AGs and that the RBM latent space provides a relevant encoding of the data, hence allowing further exploration of the reference dataset and features for solving supervised tasks. Generative models and AGs have the potential to become valuable assets in genetic studies by providing a rich yet compact representation of existing genomes and high-quality, easy-access and anonymous alternatives for private databases. Public Library of Science 2021-02-04 /pmc/articles/PMC7861435/ /pubmed/33539374 http://dx.doi.org/10.1371/journal.pgen.1009303 Text en © 2021 Yelmen et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Yelmen, Burak
Decelle, Aurélien
Ongaro, Linda
Marnetto, Davide
Tallec, Corentin
Montinaro, Francesco
Furtlehner, Cyril
Pagani, Luca
Jay, Flora
Creating artificial human genomes using generative neural networks
title Creating artificial human genomes using generative neural networks
title_full Creating artificial human genomes using generative neural networks
title_fullStr Creating artificial human genomes using generative neural networks
title_full_unstemmed Creating artificial human genomes using generative neural networks
title_short Creating artificial human genomes using generative neural networks
title_sort creating artificial human genomes using generative neural networks
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7861435/
https://www.ncbi.nlm.nih.gov/pubmed/33539374
http://dx.doi.org/10.1371/journal.pgen.1009303
work_keys_str_mv AT yelmenburak creatingartificialhumangenomesusinggenerativeneuralnetworks
AT decelleaurelien creatingartificialhumangenomesusinggenerativeneuralnetworks
AT ongarolinda creatingartificialhumangenomesusinggenerativeneuralnetworks
AT marnettodavide creatingartificialhumangenomesusinggenerativeneuralnetworks
AT talleccorentin creatingartificialhumangenomesusinggenerativeneuralnetworks
AT montinarofrancesco creatingartificialhumangenomesusinggenerativeneuralnetworks
AT furtlehnercyril creatingartificialhumangenomesusinggenerativeneuralnetworks
AT paganiluca creatingartificialhumangenomesusinggenerativeneuralnetworks
AT jayflora creatingartificialhumangenomesusinggenerativeneuralnetworks