Cargando…

Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique

BACKGROUND: The number of porcine Single Nucleotide Polymorphisms (SNPs) used in genetic association studies is very large, suitable for statistical testing. However, in breed classification problem, one needs to have a much smaller porcine-classifying SNPs (PCSNPs) set that could accurately classif...

Descripción completa

Detalles Bibliográficos
Autores principales: Pasupa, Kitsuchart, Rathasamuth, Wanthanee, Tongsima, Sissades
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7251909/
https://www.ncbi.nlm.nih.gov/pubmed/32456608
http://dx.doi.org/10.1186/s12859-020-3471-4
_version_ 1783539052127977472
author Pasupa, Kitsuchart
Rathasamuth, Wanthanee
Tongsima, Sissades
author_facet Pasupa, Kitsuchart
Rathasamuth, Wanthanee
Tongsima, Sissades
author_sort Pasupa, Kitsuchart
collection PubMed
description BACKGROUND: The number of porcine Single Nucleotide Polymorphisms (SNPs) used in genetic association studies is very large, suitable for statistical testing. However, in breed classification problem, one needs to have a much smaller porcine-classifying SNPs (PCSNPs) set that could accurately classify pigs into different breeds. This study attempted to find such PCSNPs by using several combinations of feature selection and classification methods. We experimented with different combinations of feature selection methods including information gain, conventional as well as modified genetic algorithms, and our developed frequency feature selection method in combination with a common classification method, Support Vector Machine, to evaluate the method’s performance. Experiments were conducted on a comprehensive data set containing SNPs from native pigs from America, Europe, Africa, and Asia including Chinese breeds, Vietnamese breeds, and hybrid breeds from Thailand. RESULTS: The best combination of feature selection methods—information gain, modified genetic algorithm, and frequency feature selection hybrid—was able to reduce the number of possible PCSNPs to only 1.62% (164 PCSNPs) of the total number of SNPs (10,210 SNPs) while maintaining a high classification accuracy (95.12%). Moreover, the near-identical performance of this PCSNPs set to those of bigger data sets as well as even the entire data set. Moreover, most PCSNPs were well-matched to a set of 94 genes in the PANTHER pathway, conforming to a suggestion by the Porcine Genomic Sequencing Initiative. CONCLUSIONS: The best hybrid method truly provided a sufficiently small number of porcine SNPs that accurately classified swine breeds.
format Online
Article
Text
id pubmed-7251909
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-72519092020-06-07 Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique Pasupa, Kitsuchart Rathasamuth, Wanthanee Tongsima, Sissades BMC Bioinformatics Methodology Article BACKGROUND: The number of porcine Single Nucleotide Polymorphisms (SNPs) used in genetic association studies is very large, suitable for statistical testing. However, in breed classification problem, one needs to have a much smaller porcine-classifying SNPs (PCSNPs) set that could accurately classify pigs into different breeds. This study attempted to find such PCSNPs by using several combinations of feature selection and classification methods. We experimented with different combinations of feature selection methods including information gain, conventional as well as modified genetic algorithms, and our developed frequency feature selection method in combination with a common classification method, Support Vector Machine, to evaluate the method’s performance. Experiments were conducted on a comprehensive data set containing SNPs from native pigs from America, Europe, Africa, and Asia including Chinese breeds, Vietnamese breeds, and hybrid breeds from Thailand. RESULTS: The best combination of feature selection methods—information gain, modified genetic algorithm, and frequency feature selection hybrid—was able to reduce the number of possible PCSNPs to only 1.62% (164 PCSNPs) of the total number of SNPs (10,210 SNPs) while maintaining a high classification accuracy (95.12%). Moreover, the near-identical performance of this PCSNPs set to those of bigger data sets as well as even the entire data set. Moreover, most PCSNPs were well-matched to a set of 94 genes in the PANTHER pathway, conforming to a suggestion by the Porcine Genomic Sequencing Initiative. CONCLUSIONS: The best hybrid method truly provided a sufficiently small number of porcine SNPs that accurately classified swine breeds. BioMed Central 2020-05-26 /pmc/articles/PMC7251909/ /pubmed/32456608 http://dx.doi.org/10.1186/s12859-020-3471-4 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Pasupa, Kitsuchart
Rathasamuth, Wanthanee
Tongsima, Sissades
Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique
title Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique
title_full Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique
title_fullStr Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique
title_full_unstemmed Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique
title_short Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique
title_sort discovery of significant porcine snps for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7251909/
https://www.ncbi.nlm.nih.gov/pubmed/32456608
http://dx.doi.org/10.1186/s12859-020-3471-4
work_keys_str_mv AT pasupakitsuchart discoveryofsignificantporcinesnpsforswinebreedidentificationbyahybridofinformationgaingeneticalgorithmandfrequencyfeatureselectiontechnique
AT rathasamuthwanthanee discoveryofsignificantporcinesnpsforswinebreedidentificationbyahybridofinformationgaingeneticalgorithmandfrequencyfeatureselectiontechnique
AT tongsimasissades discoveryofsignificantporcinesnpsforswinebreedidentificationbyahybridofinformationgaingeneticalgorithmandfrequencyfeatureselectiontechnique