Cargando…
Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm
Chinese indigenous pig breeds have unique genetic characteristics and a rich diversity; however, effective breed identification methods have not yet been well established. In this study, a genotype file of 62,822 single-nucleotide polymorphisms (SNPs), which were obtained from 1059 individuals of 18...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9778029/ https://www.ncbi.nlm.nih.gov/pubmed/36553474 http://dx.doi.org/10.3390/genes13122207 |
_version_ | 1784856255711412224 |
---|---|
author | Gao, Jun Sun, Lingwei Zhang, Shushan Xu, Jiehuan He, Mengqian Zhang, Defu Wu, Caifeng Dai, Jianjun |
author_facet | Gao, Jun Sun, Lingwei Zhang, Shushan Xu, Jiehuan He, Mengqian Zhang, Defu Wu, Caifeng Dai, Jianjun |
author_sort | Gao, Jun |
collection | PubMed |
description | Chinese indigenous pig breeds have unique genetic characteristics and a rich diversity; however, effective breed identification methods have not yet been well established. In this study, a genotype file of 62,822 single-nucleotide polymorphisms (SNPs), which were obtained from 1059 individuals of 18 Chinese indigenous pig breeds and 5 cosmopolitan breeds, were used to screen the discriminating SNPs for pig breed identification. After linkage disequilibrium (LD) pruning filtering, this study excluded 396 SNPs on non-constant chromosomes and retained 20.92~−27.84% of SNPs for each of the 18 autosomes, leaving a total of 14,823 SNPs. The principal component analysis (PCA) showed the largest differences between cosmopolitan and Chinese pig breeds (PC1 = 10.452%), while relatively small differences were found among the 18 indigenous pig breeds from the Yangtze River Delta region of China. Next, a random forest (RF) algorithm was used to filter these SNPs and obtain the optimal number of decision trees (ntree = 1000) using corresponding out-of-bag (OOB) error rates. By comparing two different SNP ranking methods in the RF analysis, the mean decreasing accuracy (MDA) and mean decreasing Gini index (MDG), the effects of panels with different numbers of SNPs on the assignment accuracy, and the statistics of SNP distribution on each chromosome in the panels, a panel of 1000 of the most breed-discriminative tagged SNPs were finally selected based on the MDA screening method. A high accuracy (>99.3%) was obtained by the breed prediction of 318 samples in the RF test set; thus, a machine learning classification method was established for the multi-breed identification of Chinese indigenous pigs based on a low-density panel of SNPs. |
format | Online Article Text |
id | pubmed-9778029 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-97780292022-12-23 Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm Gao, Jun Sun, Lingwei Zhang, Shushan Xu, Jiehuan He, Mengqian Zhang, Defu Wu, Caifeng Dai, Jianjun Genes (Basel) Article Chinese indigenous pig breeds have unique genetic characteristics and a rich diversity; however, effective breed identification methods have not yet been well established. In this study, a genotype file of 62,822 single-nucleotide polymorphisms (SNPs), which were obtained from 1059 individuals of 18 Chinese indigenous pig breeds and 5 cosmopolitan breeds, were used to screen the discriminating SNPs for pig breed identification. After linkage disequilibrium (LD) pruning filtering, this study excluded 396 SNPs on non-constant chromosomes and retained 20.92~−27.84% of SNPs for each of the 18 autosomes, leaving a total of 14,823 SNPs. The principal component analysis (PCA) showed the largest differences between cosmopolitan and Chinese pig breeds (PC1 = 10.452%), while relatively small differences were found among the 18 indigenous pig breeds from the Yangtze River Delta region of China. Next, a random forest (RF) algorithm was used to filter these SNPs and obtain the optimal number of decision trees (ntree = 1000) using corresponding out-of-bag (OOB) error rates. By comparing two different SNP ranking methods in the RF analysis, the mean decreasing accuracy (MDA) and mean decreasing Gini index (MDG), the effects of panels with different numbers of SNPs on the assignment accuracy, and the statistics of SNP distribution on each chromosome in the panels, a panel of 1000 of the most breed-discriminative tagged SNPs were finally selected based on the MDA screening method. A high accuracy (>99.3%) was obtained by the breed prediction of 318 samples in the RF test set; thus, a machine learning classification method was established for the multi-breed identification of Chinese indigenous pigs based on a low-density panel of SNPs. MDPI 2022-11-25 /pmc/articles/PMC9778029/ /pubmed/36553474 http://dx.doi.org/10.3390/genes13122207 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Gao, Jun Sun, Lingwei Zhang, Shushan Xu, Jiehuan He, Mengqian Zhang, Defu Wu, Caifeng Dai, Jianjun Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm |
title | Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm |
title_full | Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm |
title_fullStr | Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm |
title_full_unstemmed | Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm |
title_short | Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm |
title_sort | screening discriminating snps for chinese indigenous pig breeds identification using a random forests algorithm |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9778029/ https://www.ncbi.nlm.nih.gov/pubmed/36553474 http://dx.doi.org/10.3390/genes13122207 |
work_keys_str_mv | AT gaojun screeningdiscriminatingsnpsforchineseindigenouspigbreedsidentificationusingarandomforestsalgorithm AT sunlingwei screeningdiscriminatingsnpsforchineseindigenouspigbreedsidentificationusingarandomforestsalgorithm AT zhangshushan screeningdiscriminatingsnpsforchineseindigenouspigbreedsidentificationusingarandomforestsalgorithm AT xujiehuan screeningdiscriminatingsnpsforchineseindigenouspigbreedsidentificationusingarandomforestsalgorithm AT hemengqian screeningdiscriminatingsnpsforchineseindigenouspigbreedsidentificationusingarandomforestsalgorithm AT zhangdefu screeningdiscriminatingsnpsforchineseindigenouspigbreedsidentificationusingarandomforestsalgorithm AT wucaifeng screeningdiscriminatingsnpsforchineseindigenouspigbreedsidentificationusingarandomforestsalgorithm AT daijianjun screeningdiscriminatingsnpsforchineseindigenouspigbreedsidentificationusingarandomforestsalgorithm |