Cargando…

Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs

SIMPLE SUMMARY: Classifying a target population at the genetic level can provide important information for the preservation and commercial use of a breed. In this study, the minimum number of markers was used in combination, to distinguish target populations based on high-density single nucleotide p...

Descripción completa

Detalles Bibliográficos
Autores principales: Seo, Dongwon, Cho, Sunghyun, Manjula, Prabuddha, Choi, Nuri, Kim, Young-Kuk, Koh, Yeong Jun, Lee, Seung Hwan, Kim, Hyung-Yong, Lee, Jun Heon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7835996/
https://www.ncbi.nlm.nih.gov/pubmed/33477975
http://dx.doi.org/10.3390/ani11010241
_version_ 1783642656465747968
author Seo, Dongwon
Cho, Sunghyun
Manjula, Prabuddha
Choi, Nuri
Kim, Young-Kuk
Koh, Yeong Jun
Lee, Seung Hwan
Kim, Hyung-Yong
Lee, Jun Heon
author_facet Seo, Dongwon
Cho, Sunghyun
Manjula, Prabuddha
Choi, Nuri
Kim, Young-Kuk
Koh, Yeong Jun
Lee, Seung Hwan
Kim, Hyung-Yong
Lee, Jun Heon
author_sort Seo, Dongwon
collection PubMed
description SIMPLE SUMMARY: Classifying a target population at the genetic level can provide important information for the preservation and commercial use of a breed. In this study, the minimum number of markers was used in combination, to distinguish target populations based on high-density single nucleotide polymorphism (SNP) array data. Subsequently, a genome-wide association study for filtering target-population-specific SNPs between the case and control groups and principal component analysis with machine learning algorithms could be used to explore various combinations with the minimum number of markers. In addition, the optimal combination of SNP markers was able to produce stable results for the target population in verification studies, in which samples were analyzed. ABSTRACT: A marker combination capable of classifying a specific chicken population could improve commercial value by increasing consumer confidence with respect to the origin of the population. This would facilitate the protection of native genetic resources in the market of each country. In this study, a total of 283 samples from 20 lines, which consisted of Korean native chickens, commercial native chickens, and commercial broilers with a layer population, were analyzed to determine the optimal marker combination comprising the minimum number of markers, using a 600 k high-density single nucleotide polymorphism (SNP) array. Machine learning algorithms, a genome-wide association study (GWAS), linkage disequilibrium (LD) analysis, and principal component analysis (PCA) were used to distinguish a target (case) group for comparison with control chicken groups. In the processing of marker selection, a total of 47,303 SNPs were used for classifying chicken populations; 96 LD-pruned SNPs (50 SNPs per LD block) served as the best marker combination for target chicken classification. Moreover, 36, 44, and 8 SNPs were selected as the minimum numbers of markers by the AdaBoost (AB), Random Forest (RF), and Decision Tree (DT) machine learning classification models, which had accuracy rates of 99.6%, 98.0%, and 97.9%, respectively. The selected marker combinations increased the genetic distance and fixation index (Fst) values between the case and control groups, and they reduced the number of genetic components required, confirming that efficient classification of the groups was possible by using a small number of marker sets. In a verification study including additional chicken breeds and samples (12 lines and 182 samples), the accuracy did not significantly change, and the target chicken group could be clearly distinguished from the other populations. The GWAS, PCA, and machine learning algorithms used in this study can be applied efficiently, to determine the optimal marker combination with the minimum number of markers that can distinguish the target population among a large number of SNP markers.
format Online
Article
Text
id pubmed-7835996
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-78359962021-01-27 Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs Seo, Dongwon Cho, Sunghyun Manjula, Prabuddha Choi, Nuri Kim, Young-Kuk Koh, Yeong Jun Lee, Seung Hwan Kim, Hyung-Yong Lee, Jun Heon Animals (Basel) Article SIMPLE SUMMARY: Classifying a target population at the genetic level can provide important information for the preservation and commercial use of a breed. In this study, the minimum number of markers was used in combination, to distinguish target populations based on high-density single nucleotide polymorphism (SNP) array data. Subsequently, a genome-wide association study for filtering target-population-specific SNPs between the case and control groups and principal component analysis with machine learning algorithms could be used to explore various combinations with the minimum number of markers. In addition, the optimal combination of SNP markers was able to produce stable results for the target population in verification studies, in which samples were analyzed. ABSTRACT: A marker combination capable of classifying a specific chicken population could improve commercial value by increasing consumer confidence with respect to the origin of the population. This would facilitate the protection of native genetic resources in the market of each country. In this study, a total of 283 samples from 20 lines, which consisted of Korean native chickens, commercial native chickens, and commercial broilers with a layer population, were analyzed to determine the optimal marker combination comprising the minimum number of markers, using a 600 k high-density single nucleotide polymorphism (SNP) array. Machine learning algorithms, a genome-wide association study (GWAS), linkage disequilibrium (LD) analysis, and principal component analysis (PCA) were used to distinguish a target (case) group for comparison with control chicken groups. In the processing of marker selection, a total of 47,303 SNPs were used for classifying chicken populations; 96 LD-pruned SNPs (50 SNPs per LD block) served as the best marker combination for target chicken classification. Moreover, 36, 44, and 8 SNPs were selected as the minimum numbers of markers by the AdaBoost (AB), Random Forest (RF), and Decision Tree (DT) machine learning classification models, which had accuracy rates of 99.6%, 98.0%, and 97.9%, respectively. The selected marker combinations increased the genetic distance and fixation index (Fst) values between the case and control groups, and they reduced the number of genetic components required, confirming that efficient classification of the groups was possible by using a small number of marker sets. In a verification study including additional chicken breeds and samples (12 lines and 182 samples), the accuracy did not significantly change, and the target chicken group could be clearly distinguished from the other populations. The GWAS, PCA, and machine learning algorithms used in this study can be applied efficiently, to determine the optimal marker combination with the minimum number of markers that can distinguish the target population among a large number of SNP markers. MDPI 2021-01-19 /pmc/articles/PMC7835996/ /pubmed/33477975 http://dx.doi.org/10.3390/ani11010241 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Seo, Dongwon
Cho, Sunghyun
Manjula, Prabuddha
Choi, Nuri
Kim, Young-Kuk
Koh, Yeong Jun
Lee, Seung Hwan
Kim, Hyung-Yong
Lee, Jun Heon
Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs
title Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs
title_full Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs
title_fullStr Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs
title_full_unstemmed Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs
title_short Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs
title_sort identification of target chicken populations by machine learning models using the minimum number of snps
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7835996/
https://www.ncbi.nlm.nih.gov/pubmed/33477975
http://dx.doi.org/10.3390/ani11010241
work_keys_str_mv AT seodongwon identificationoftargetchickenpopulationsbymachinelearningmodelsusingtheminimumnumberofsnps
AT chosunghyun identificationoftargetchickenpopulationsbymachinelearningmodelsusingtheminimumnumberofsnps
AT manjulaprabuddha identificationoftargetchickenpopulationsbymachinelearningmodelsusingtheminimumnumberofsnps
AT choinuri identificationoftargetchickenpopulationsbymachinelearningmodelsusingtheminimumnumberofsnps
AT kimyoungkuk identificationoftargetchickenpopulationsbymachinelearningmodelsusingtheminimumnumberofsnps
AT kohyeongjun identificationoftargetchickenpopulationsbymachinelearningmodelsusingtheminimumnumberofsnps
AT leeseunghwan identificationoftargetchickenpopulationsbymachinelearningmodelsusingtheminimumnumberofsnps
AT kimhyungyong identificationoftargetchickenpopulationsbymachinelearningmodelsusingtheminimumnumberofsnps
AT leejunheon identificationoftargetchickenpopulationsbymachinelearningmodelsusingtheminimumnumberofsnps