Cargando…

Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks

With precision medicine as the goal, the human biobank of each country should be analyzed to determine the complete research results related to genetic diseases. In addition, with the increase in medical imaging data, automatic image processing with image recognition has been widely studied and appl...

Descripción completa

Detalles Bibliográficos
Autores principales: Hsieh, Ai-Ru, Li, Yi-Mei Aimee
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8859115/
https://www.ncbi.nlm.nih.gov/pubmed/35198009
http://dx.doi.org/10.3389/fgene.2022.822117
_version_ 1784654382193704960
author Hsieh, Ai-Ru
Li, Yi-Mei Aimee
author_facet Hsieh, Ai-Ru
Li, Yi-Mei Aimee
author_sort Hsieh, Ai-Ru
collection PubMed
description With precision medicine as the goal, the human biobank of each country should be analyzed to determine the complete research results related to genetic diseases. In addition, with the increase in medical imaging data, automatic image processing with image recognition has been widely studied and applied in biomedicine. However, case–control data imbalance often occurs in human biobanks, which is usually solved by the statistical method SAIGE. Due to the huge amount of genetic data in human biobanks, the direct use of the SAIGE method often faces the problem of insufficient computer memory to support calculations and excessive calculation time. The other method is to use sampling to adjust the data to balance the case–control ratio, which is called Synthetic Minority Oversampling Technique (SMOTE). Our study employed the Manhattan plot and genetic disease information from the Taiwan Biobank to adjust the imbalance in the case–control ratio by SMOTE, called “TW-SMOTE.” We further used a deep learning image recognition system to identify the TW-SMOTE. We found that TW-SMOTE can achieve the same results as that of SAIGE and the UK Biobank (UKB). The processing of the technical data can be equivalent to the use of data plots with a relatively large UKB sample size and achieve the same effect as that of SAIGE in addressing data imbalance.
format Online
Article
Text
id pubmed-8859115
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-88591152022-02-22 Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks Hsieh, Ai-Ru Li, Yi-Mei Aimee Front Genet Genetics With precision medicine as the goal, the human biobank of each country should be analyzed to determine the complete research results related to genetic diseases. In addition, with the increase in medical imaging data, automatic image processing with image recognition has been widely studied and applied in biomedicine. However, case–control data imbalance often occurs in human biobanks, which is usually solved by the statistical method SAIGE. Due to the huge amount of genetic data in human biobanks, the direct use of the SAIGE method often faces the problem of insufficient computer memory to support calculations and excessive calculation time. The other method is to use sampling to adjust the data to balance the case–control ratio, which is called Synthetic Minority Oversampling Technique (SMOTE). Our study employed the Manhattan plot and genetic disease information from the Taiwan Biobank to adjust the imbalance in the case–control ratio by SMOTE, called “TW-SMOTE.” We further used a deep learning image recognition system to identify the TW-SMOTE. We found that TW-SMOTE can achieve the same results as that of SAIGE and the UK Biobank (UKB). The processing of the technical data can be equivalent to the use of data plots with a relatively large UKB sample size and achieve the same effect as that of SAIGE in addressing data imbalance. Frontiers Media S.A. 2022-02-07 /pmc/articles/PMC8859115/ /pubmed/35198009 http://dx.doi.org/10.3389/fgene.2022.822117 Text en Copyright © 2022 Hsieh and Li. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Hsieh, Ai-Ru
Li, Yi-Mei Aimee
Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks
title Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks
title_full Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks
title_fullStr Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks
title_full_unstemmed Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks
title_short Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks
title_sort using image recognition to process unbalanced data in genetic diseases from biobanks
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8859115/
https://www.ncbi.nlm.nih.gov/pubmed/35198009
http://dx.doi.org/10.3389/fgene.2022.822117
work_keys_str_mv AT hsiehairu usingimagerecognitiontoprocessunbalanceddataingeneticdiseasesfrombiobanks
AT liyimeiaimee usingimagerecognitiontoprocessunbalanceddataingeneticdiseasesfrombiobanks