Cargando…

Increasing prediction accuracy of pathogenic staging by sample augmentation with a GAN

Accurate prediction of cancer stage is important in that it enables more appropriate treatment for patients with cancer. Many measures or methods have been proposed for more accurate prediction of cancer stage, but recently, machine learning, especially deep learning-based methods have been receivin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kwon, ChangHyuk, Park, Sangjin, Ko, Soohyun, Ahn, Jaegyoon
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8078779/ https://www.ncbi.nlm.nih.gov/pubmed/33905431 http://dx.doi.org/10.1371/journal.pone.0250458

_version_	1783685102063058944
author	Kwon, ChangHyuk Park, Sangjin Ko, Soohyun Ahn, Jaegyoon
author_facet	Kwon, ChangHyuk Park, Sangjin Ko, Soohyun Ahn, Jaegyoon
author_sort	Kwon, ChangHyuk
collection	PubMed
description	Accurate prediction of cancer stage is important in that it enables more appropriate treatment for patients with cancer. Many measures or methods have been proposed for more accurate prediction of cancer stage, but recently, machine learning, especially deep learning-based methods have been receiving increasing attention, mostly owing to their good prediction accuracy in many applications. Machine learning methods can be applied to high throughput DNA mutation or RNA expression data to predict cancer stage. However, because the number of genes or markers generally exceeds 10,000, a considerable number of data samples is required to guarantee high prediction accuracy. To solve this problem of a small number of clinical samples, we used a Generative Adversarial Networks (GANs) to augment the samples. Because GANs are not effective with whole genes, we first selected significant genes using DNA mutation data and random forest feature ranking. Next, RNA expression data for selected genes were expanded using GANs. We compared the classification accuracies using original dataset and expanded datasets generated by proposed and existing methods, using random forest, Deep Neural Networks (DNNs), and 1-Dimensional Convolutional Neural Networks (1DCNN). When using the 1DCNN, the F1 score of GAN5 (a 5-fold increase in data) was improved by 39% in relation to the original data. Moreover, the results using only 30% of the data were better than those using all of the data. Our attempt is the first to use GAN for augmentation using numeric data for both DNA and RNA. The augmented datasets obtained using the proposed method demonstrated significantly increased classification accuracy for most cases. By using GAN and 1DCNN in the prediction of cancer stage, we confirmed that good results can be obtained even with small amounts of samples, and it is expected that a great deal of the cost and time required to obtain clinical samples will be reduced. The proposed sample augmentation method could also be applied for other purposes, such as prognostic prediction or cancer classification.
format	Online Article Text
id	pubmed-8078779
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-80787792021-05-05 Increasing prediction accuracy of pathogenic staging by sample augmentation with a GAN Kwon, ChangHyuk Park, Sangjin Ko, Soohyun Ahn, Jaegyoon PLoS One Research Article Accurate prediction of cancer stage is important in that it enables more appropriate treatment for patients with cancer. Many measures or methods have been proposed for more accurate prediction of cancer stage, but recently, machine learning, especially deep learning-based methods have been receiving increasing attention, mostly owing to their good prediction accuracy in many applications. Machine learning methods can be applied to high throughput DNA mutation or RNA expression data to predict cancer stage. However, because the number of genes or markers generally exceeds 10,000, a considerable number of data samples is required to guarantee high prediction accuracy. To solve this problem of a small number of clinical samples, we used a Generative Adversarial Networks (GANs) to augment the samples. Because GANs are not effective with whole genes, we first selected significant genes using DNA mutation data and random forest feature ranking. Next, RNA expression data for selected genes were expanded using GANs. We compared the classification accuracies using original dataset and expanded datasets generated by proposed and existing methods, using random forest, Deep Neural Networks (DNNs), and 1-Dimensional Convolutional Neural Networks (1DCNN). When using the 1DCNN, the F1 score of GAN5 (a 5-fold increase in data) was improved by 39% in relation to the original data. Moreover, the results using only 30% of the data were better than those using all of the data. Our attempt is the first to use GAN for augmentation using numeric data for both DNA and RNA. The augmented datasets obtained using the proposed method demonstrated significantly increased classification accuracy for most cases. By using GAN and 1DCNN in the prediction of cancer stage, we confirmed that good results can be obtained even with small amounts of samples, and it is expected that a great deal of the cost and time required to obtain clinical samples will be reduced. The proposed sample augmentation method could also be applied for other purposes, such as prognostic prediction or cancer classification. Public Library of Science 2021-04-27 /pmc/articles/PMC8078779/ /pubmed/33905431 http://dx.doi.org/10.1371/journal.pone.0250458 Text en © 2021 Kwon et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Kwon, ChangHyuk Park, Sangjin Ko, Soohyun Ahn, Jaegyoon Increasing prediction accuracy of pathogenic staging by sample augmentation with a GAN
title	Increasing prediction accuracy of pathogenic staging by sample augmentation with a GAN
title_full	Increasing prediction accuracy of pathogenic staging by sample augmentation with a GAN
title_fullStr	Increasing prediction accuracy of pathogenic staging by sample augmentation with a GAN
title_full_unstemmed	Increasing prediction accuracy of pathogenic staging by sample augmentation with a GAN
title_short	Increasing prediction accuracy of pathogenic staging by sample augmentation with a GAN
title_sort	increasing prediction accuracy of pathogenic staging by sample augmentation with a gan
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8078779/ https://www.ncbi.nlm.nih.gov/pubmed/33905431 http://dx.doi.org/10.1371/journal.pone.0250458
work_keys_str_mv	AT kwonchanghyuk increasingpredictionaccuracyofpathogenicstagingbysampleaugmentationwithagan AT parksangjin increasingpredictionaccuracyofpathogenicstagingbysampleaugmentationwithagan AT kosoohyun increasingpredictionaccuracyofpathogenicstagingbysampleaugmentationwithagan AT ahnjaegyoon increasingpredictionaccuracyofpathogenicstagingbysampleaugmentationwithagan

Increasing prediction accuracy of pathogenic staging by sample augmentation with a GAN

Ejemplares similares