Cargando…
Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks
With precision medicine as the goal, the human biobank of each country should be analyzed to determine the complete research results related to genetic diseases. In addition, with the increase in medical imaging data, automatic image processing with image recognition has been widely studied and appl...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8859115/ https://www.ncbi.nlm.nih.gov/pubmed/35198009 http://dx.doi.org/10.3389/fgene.2022.822117 |
_version_ | 1784654382193704960 |
---|---|
author | Hsieh, Ai-Ru Li, Yi-Mei Aimee |
author_facet | Hsieh, Ai-Ru Li, Yi-Mei Aimee |
author_sort | Hsieh, Ai-Ru |
collection | PubMed |
description | With precision medicine as the goal, the human biobank of each country should be analyzed to determine the complete research results related to genetic diseases. In addition, with the increase in medical imaging data, automatic image processing with image recognition has been widely studied and applied in biomedicine. However, case–control data imbalance often occurs in human biobanks, which is usually solved by the statistical method SAIGE. Due to the huge amount of genetic data in human biobanks, the direct use of the SAIGE method often faces the problem of insufficient computer memory to support calculations and excessive calculation time. The other method is to use sampling to adjust the data to balance the case–control ratio, which is called Synthetic Minority Oversampling Technique (SMOTE). Our study employed the Manhattan plot and genetic disease information from the Taiwan Biobank to adjust the imbalance in the case–control ratio by SMOTE, called “TW-SMOTE.” We further used a deep learning image recognition system to identify the TW-SMOTE. We found that TW-SMOTE can achieve the same results as that of SAIGE and the UK Biobank (UKB). The processing of the technical data can be equivalent to the use of data plots with a relatively large UKB sample size and achieve the same effect as that of SAIGE in addressing data imbalance. |
format | Online Article Text |
id | pubmed-8859115 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-88591152022-02-22 Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks Hsieh, Ai-Ru Li, Yi-Mei Aimee Front Genet Genetics With precision medicine as the goal, the human biobank of each country should be analyzed to determine the complete research results related to genetic diseases. In addition, with the increase in medical imaging data, automatic image processing with image recognition has been widely studied and applied in biomedicine. However, case–control data imbalance often occurs in human biobanks, which is usually solved by the statistical method SAIGE. Due to the huge amount of genetic data in human biobanks, the direct use of the SAIGE method often faces the problem of insufficient computer memory to support calculations and excessive calculation time. The other method is to use sampling to adjust the data to balance the case–control ratio, which is called Synthetic Minority Oversampling Technique (SMOTE). Our study employed the Manhattan plot and genetic disease information from the Taiwan Biobank to adjust the imbalance in the case–control ratio by SMOTE, called “TW-SMOTE.” We further used a deep learning image recognition system to identify the TW-SMOTE. We found that TW-SMOTE can achieve the same results as that of SAIGE and the UK Biobank (UKB). The processing of the technical data can be equivalent to the use of data plots with a relatively large UKB sample size and achieve the same effect as that of SAIGE in addressing data imbalance. Frontiers Media S.A. 2022-02-07 /pmc/articles/PMC8859115/ /pubmed/35198009 http://dx.doi.org/10.3389/fgene.2022.822117 Text en Copyright © 2022 Hsieh and Li. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Hsieh, Ai-Ru Li, Yi-Mei Aimee Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks |
title | Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks |
title_full | Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks |
title_fullStr | Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks |
title_full_unstemmed | Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks |
title_short | Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks |
title_sort | using image recognition to process unbalanced data in genetic diseases from biobanks |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8859115/ https://www.ncbi.nlm.nih.gov/pubmed/35198009 http://dx.doi.org/10.3389/fgene.2022.822117 |
work_keys_str_mv | AT hsiehairu usingimagerecognitiontoprocessunbalanceddataingeneticdiseasesfrombiobanks AT liyimeiaimee usingimagerecognitiontoprocessunbalanceddataingeneticdiseasesfrombiobanks |