Cargando…

Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome

Nucleic acids are the basic units of deoxyribonucleic acid (DNA) sequencing. Every organism demonstrates different DNA sequences with specific nucleotides. It reveals the genetic information carried by a particular DNA segment. Nucleic acid sequencing expresses the evolutionary changes among organis...

Descripción completa

Detalles Bibliográficos
Autores principales: Hazra, Debapriya, Kim, Mi-Ryung, Byun, Yung-Cheol
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8998662/
https://www.ncbi.nlm.nih.gov/pubmed/35409058
http://dx.doi.org/10.3390/ijms23073701
_version_ 1784684996947083264
author Hazra, Debapriya
Kim, Mi-Ryung
Byun, Yung-Cheol
author_facet Hazra, Debapriya
Kim, Mi-Ryung
Byun, Yung-Cheol
author_sort Hazra, Debapriya
collection PubMed
description Nucleic acids are the basic units of deoxyribonucleic acid (DNA) sequencing. Every organism demonstrates different DNA sequences with specific nucleotides. It reveals the genetic information carried by a particular DNA segment. Nucleic acid sequencing expresses the evolutionary changes among organisms and revolutionizes disease diagnosis in animals. This paper proposes a generative adversarial networks (GAN) model to create synthetic nucleic acid sequences of the cat genome tuned to exhibit specific desired properties. We obtained the raw sequence data from Illumina next generation sequencing. Various data preprocessing steps were performed using Cutadapt and DADA2 tools. The processed data were fed to the GAN model that was designed following the architecture of Wasserstein GAN with gradient penalty (WGAN-GP). We introduced a predictor and an evaluator in our proposed GAN model to tune the synthetic sequences to acquire certain realistic properties. The predictor was built for extracting samples with a promoter sequence, and the evaluator was built for filtering samples that scored high for motif-matching. The filtered samples were then passed to the discriminator. We evaluated our model based on multiple metrics and demonstrated outputs for latent interpolation, latent complementation, and motif-matching. Evaluation results showed our proposed GAN model achieved 93.7% correlation with the original data and produced significant outcomes as compared to existing models for sequence generation.
format Online
Article
Text
id pubmed-8998662
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-89986622022-04-12 Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome Hazra, Debapriya Kim, Mi-Ryung Byun, Yung-Cheol Int J Mol Sci Article Nucleic acids are the basic units of deoxyribonucleic acid (DNA) sequencing. Every organism demonstrates different DNA sequences with specific nucleotides. It reveals the genetic information carried by a particular DNA segment. Nucleic acid sequencing expresses the evolutionary changes among organisms and revolutionizes disease diagnosis in animals. This paper proposes a generative adversarial networks (GAN) model to create synthetic nucleic acid sequences of the cat genome tuned to exhibit specific desired properties. We obtained the raw sequence data from Illumina next generation sequencing. Various data preprocessing steps were performed using Cutadapt and DADA2 tools. The processed data were fed to the GAN model that was designed following the architecture of Wasserstein GAN with gradient penalty (WGAN-GP). We introduced a predictor and an evaluator in our proposed GAN model to tune the synthetic sequences to acquire certain realistic properties. The predictor was built for extracting samples with a promoter sequence, and the evaluator was built for filtering samples that scored high for motif-matching. The filtered samples were then passed to the discriminator. We evaluated our model based on multiple metrics and demonstrated outputs for latent interpolation, latent complementation, and motif-matching. Evaluation results showed our proposed GAN model achieved 93.7% correlation with the original data and produced significant outcomes as compared to existing models for sequence generation. MDPI 2022-03-28 /pmc/articles/PMC8998662/ /pubmed/35409058 http://dx.doi.org/10.3390/ijms23073701 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Hazra, Debapriya
Kim, Mi-Ryung
Byun, Yung-Cheol
Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome
title Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome
title_full Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome
title_fullStr Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome
title_full_unstemmed Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome
title_short Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome
title_sort generative adversarial networks for creating synthetic nucleic acid sequences of cat genome
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8998662/
https://www.ncbi.nlm.nih.gov/pubmed/35409058
http://dx.doi.org/10.3390/ijms23073701
work_keys_str_mv AT hazradebapriya generativeadversarialnetworksforcreatingsyntheticnucleicacidsequencesofcatgenome
AT kimmiryung generativeadversarialnetworksforcreatingsyntheticnucleicacidsequencesofcatgenome
AT byunyungcheol generativeadversarialnetworksforcreatingsyntheticnucleicacidsequencesofcatgenome