Cargando…

Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm

Chinese indigenous pig breeds have unique genetic characteristics and a rich diversity; however, effective breed identification methods have not yet been well established. In this study, a genotype file of 62,822 single-nucleotide polymorphisms (SNPs), which were obtained from 1059 individuals of 18...

Descripción completa

Detalles Bibliográficos
Autores principales: Gao, Jun, Sun, Lingwei, Zhang, Shushan, Xu, Jiehuan, He, Mengqian, Zhang, Defu, Wu, Caifeng, Dai, Jianjun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9778029/
https://www.ncbi.nlm.nih.gov/pubmed/36553474
http://dx.doi.org/10.3390/genes13122207
_version_ 1784856255711412224
author Gao, Jun
Sun, Lingwei
Zhang, Shushan
Xu, Jiehuan
He, Mengqian
Zhang, Defu
Wu, Caifeng
Dai, Jianjun
author_facet Gao, Jun
Sun, Lingwei
Zhang, Shushan
Xu, Jiehuan
He, Mengqian
Zhang, Defu
Wu, Caifeng
Dai, Jianjun
author_sort Gao, Jun
collection PubMed
description Chinese indigenous pig breeds have unique genetic characteristics and a rich diversity; however, effective breed identification methods have not yet been well established. In this study, a genotype file of 62,822 single-nucleotide polymorphisms (SNPs), which were obtained from 1059 individuals of 18 Chinese indigenous pig breeds and 5 cosmopolitan breeds, were used to screen the discriminating SNPs for pig breed identification. After linkage disequilibrium (LD) pruning filtering, this study excluded 396 SNPs on non-constant chromosomes and retained 20.92~−27.84% of SNPs for each of the 18 autosomes, leaving a total of 14,823 SNPs. The principal component analysis (PCA) showed the largest differences between cosmopolitan and Chinese pig breeds (PC1 = 10.452%), while relatively small differences were found among the 18 indigenous pig breeds from the Yangtze River Delta region of China. Next, a random forest (RF) algorithm was used to filter these SNPs and obtain the optimal number of decision trees (ntree = 1000) using corresponding out-of-bag (OOB) error rates. By comparing two different SNP ranking methods in the RF analysis, the mean decreasing accuracy (MDA) and mean decreasing Gini index (MDG), the effects of panels with different numbers of SNPs on the assignment accuracy, and the statistics of SNP distribution on each chromosome in the panels, a panel of 1000 of the most breed-discriminative tagged SNPs were finally selected based on the MDA screening method. A high accuracy (>99.3%) was obtained by the breed prediction of 318 samples in the RF test set; thus, a machine learning classification method was established for the multi-breed identification of Chinese indigenous pigs based on a low-density panel of SNPs.
format Online
Article
Text
id pubmed-9778029
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-97780292022-12-23 Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm Gao, Jun Sun, Lingwei Zhang, Shushan Xu, Jiehuan He, Mengqian Zhang, Defu Wu, Caifeng Dai, Jianjun Genes (Basel) Article Chinese indigenous pig breeds have unique genetic characteristics and a rich diversity; however, effective breed identification methods have not yet been well established. In this study, a genotype file of 62,822 single-nucleotide polymorphisms (SNPs), which were obtained from 1059 individuals of 18 Chinese indigenous pig breeds and 5 cosmopolitan breeds, were used to screen the discriminating SNPs for pig breed identification. After linkage disequilibrium (LD) pruning filtering, this study excluded 396 SNPs on non-constant chromosomes and retained 20.92~−27.84% of SNPs for each of the 18 autosomes, leaving a total of 14,823 SNPs. The principal component analysis (PCA) showed the largest differences between cosmopolitan and Chinese pig breeds (PC1 = 10.452%), while relatively small differences were found among the 18 indigenous pig breeds from the Yangtze River Delta region of China. Next, a random forest (RF) algorithm was used to filter these SNPs and obtain the optimal number of decision trees (ntree = 1000) using corresponding out-of-bag (OOB) error rates. By comparing two different SNP ranking methods in the RF analysis, the mean decreasing accuracy (MDA) and mean decreasing Gini index (MDG), the effects of panels with different numbers of SNPs on the assignment accuracy, and the statistics of SNP distribution on each chromosome in the panels, a panel of 1000 of the most breed-discriminative tagged SNPs were finally selected based on the MDA screening method. A high accuracy (>99.3%) was obtained by the breed prediction of 318 samples in the RF test set; thus, a machine learning classification method was established for the multi-breed identification of Chinese indigenous pigs based on a low-density panel of SNPs. MDPI 2022-11-25 /pmc/articles/PMC9778029/ /pubmed/36553474 http://dx.doi.org/10.3390/genes13122207 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Gao, Jun
Sun, Lingwei
Zhang, Shushan
Xu, Jiehuan
He, Mengqian
Zhang, Defu
Wu, Caifeng
Dai, Jianjun
Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm
title Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm
title_full Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm
title_fullStr Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm
title_full_unstemmed Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm
title_short Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm
title_sort screening discriminating snps for chinese indigenous pig breeds identification using a random forests algorithm
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9778029/
https://www.ncbi.nlm.nih.gov/pubmed/36553474
http://dx.doi.org/10.3390/genes13122207
work_keys_str_mv AT gaojun screeningdiscriminatingsnpsforchineseindigenouspigbreedsidentificationusingarandomforestsalgorithm
AT sunlingwei screeningdiscriminatingsnpsforchineseindigenouspigbreedsidentificationusingarandomforestsalgorithm
AT zhangshushan screeningdiscriminatingsnpsforchineseindigenouspigbreedsidentificationusingarandomforestsalgorithm
AT xujiehuan screeningdiscriminatingsnpsforchineseindigenouspigbreedsidentificationusingarandomforestsalgorithm
AT hemengqian screeningdiscriminatingsnpsforchineseindigenouspigbreedsidentificationusingarandomforestsalgorithm
AT zhangdefu screeningdiscriminatingsnpsforchineseindigenouspigbreedsidentificationusingarandomforestsalgorithm
AT wucaifeng screeningdiscriminatingsnpsforchineseindigenouspigbreedsidentificationusingarandomforestsalgorithm
AT daijianjun screeningdiscriminatingsnpsforchineseindigenouspigbreedsidentificationusingarandomforestsalgorithm