Cargando…
Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome
Nucleic acids are the basic units of deoxyribonucleic acid (DNA) sequencing. Every organism demonstrates different DNA sequences with specific nucleotides. It reveals the genetic information carried by a particular DNA segment. Nucleic acid sequencing expresses the evolutionary changes among organis...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8998662/ https://www.ncbi.nlm.nih.gov/pubmed/35409058 http://dx.doi.org/10.3390/ijms23073701 |
_version_ | 1784684996947083264 |
---|---|
author | Hazra, Debapriya Kim, Mi-Ryung Byun, Yung-Cheol |
author_facet | Hazra, Debapriya Kim, Mi-Ryung Byun, Yung-Cheol |
author_sort | Hazra, Debapriya |
collection | PubMed |
description | Nucleic acids are the basic units of deoxyribonucleic acid (DNA) sequencing. Every organism demonstrates different DNA sequences with specific nucleotides. It reveals the genetic information carried by a particular DNA segment. Nucleic acid sequencing expresses the evolutionary changes among organisms and revolutionizes disease diagnosis in animals. This paper proposes a generative adversarial networks (GAN) model to create synthetic nucleic acid sequences of the cat genome tuned to exhibit specific desired properties. We obtained the raw sequence data from Illumina next generation sequencing. Various data preprocessing steps were performed using Cutadapt and DADA2 tools. The processed data were fed to the GAN model that was designed following the architecture of Wasserstein GAN with gradient penalty (WGAN-GP). We introduced a predictor and an evaluator in our proposed GAN model to tune the synthetic sequences to acquire certain realistic properties. The predictor was built for extracting samples with a promoter sequence, and the evaluator was built for filtering samples that scored high for motif-matching. The filtered samples were then passed to the discriminator. We evaluated our model based on multiple metrics and demonstrated outputs for latent interpolation, latent complementation, and motif-matching. Evaluation results showed our proposed GAN model achieved 93.7% correlation with the original data and produced significant outcomes as compared to existing models for sequence generation. |
format | Online Article Text |
id | pubmed-8998662 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-89986622022-04-12 Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome Hazra, Debapriya Kim, Mi-Ryung Byun, Yung-Cheol Int J Mol Sci Article Nucleic acids are the basic units of deoxyribonucleic acid (DNA) sequencing. Every organism demonstrates different DNA sequences with specific nucleotides. It reveals the genetic information carried by a particular DNA segment. Nucleic acid sequencing expresses the evolutionary changes among organisms and revolutionizes disease diagnosis in animals. This paper proposes a generative adversarial networks (GAN) model to create synthetic nucleic acid sequences of the cat genome tuned to exhibit specific desired properties. We obtained the raw sequence data from Illumina next generation sequencing. Various data preprocessing steps were performed using Cutadapt and DADA2 tools. The processed data were fed to the GAN model that was designed following the architecture of Wasserstein GAN with gradient penalty (WGAN-GP). We introduced a predictor and an evaluator in our proposed GAN model to tune the synthetic sequences to acquire certain realistic properties. The predictor was built for extracting samples with a promoter sequence, and the evaluator was built for filtering samples that scored high for motif-matching. The filtered samples were then passed to the discriminator. We evaluated our model based on multiple metrics and demonstrated outputs for latent interpolation, latent complementation, and motif-matching. Evaluation results showed our proposed GAN model achieved 93.7% correlation with the original data and produced significant outcomes as compared to existing models for sequence generation. MDPI 2022-03-28 /pmc/articles/PMC8998662/ /pubmed/35409058 http://dx.doi.org/10.3390/ijms23073701 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Hazra, Debapriya Kim, Mi-Ryung Byun, Yung-Cheol Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome |
title | Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome |
title_full | Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome |
title_fullStr | Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome |
title_full_unstemmed | Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome |
title_short | Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome |
title_sort | generative adversarial networks for creating synthetic nucleic acid sequences of cat genome |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8998662/ https://www.ncbi.nlm.nih.gov/pubmed/35409058 http://dx.doi.org/10.3390/ijms23073701 |
work_keys_str_mv | AT hazradebapriya generativeadversarialnetworksforcreatingsyntheticnucleicacidsequencesofcatgenome AT kimmiryung generativeadversarialnetworksforcreatingsyntheticnucleicacidsequencesofcatgenome AT byunyungcheol generativeadversarialnetworksforcreatingsyntheticnucleicacidsequencesofcatgenome |