Cargando…
Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique
BACKGROUND: The number of porcine Single Nucleotide Polymorphisms (SNPs) used in genetic association studies is very large, suitable for statistical testing. However, in breed classification problem, one needs to have a much smaller porcine-classifying SNPs (PCSNPs) set that could accurately classif...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7251909/ https://www.ncbi.nlm.nih.gov/pubmed/32456608 http://dx.doi.org/10.1186/s12859-020-3471-4 |
_version_ | 1783539052127977472 |
---|---|
author | Pasupa, Kitsuchart Rathasamuth, Wanthanee Tongsima, Sissades |
author_facet | Pasupa, Kitsuchart Rathasamuth, Wanthanee Tongsima, Sissades |
author_sort | Pasupa, Kitsuchart |
collection | PubMed |
description | BACKGROUND: The number of porcine Single Nucleotide Polymorphisms (SNPs) used in genetic association studies is very large, suitable for statistical testing. However, in breed classification problem, one needs to have a much smaller porcine-classifying SNPs (PCSNPs) set that could accurately classify pigs into different breeds. This study attempted to find such PCSNPs by using several combinations of feature selection and classification methods. We experimented with different combinations of feature selection methods including information gain, conventional as well as modified genetic algorithms, and our developed frequency feature selection method in combination with a common classification method, Support Vector Machine, to evaluate the method’s performance. Experiments were conducted on a comprehensive data set containing SNPs from native pigs from America, Europe, Africa, and Asia including Chinese breeds, Vietnamese breeds, and hybrid breeds from Thailand. RESULTS: The best combination of feature selection methods—information gain, modified genetic algorithm, and frequency feature selection hybrid—was able to reduce the number of possible PCSNPs to only 1.62% (164 PCSNPs) of the total number of SNPs (10,210 SNPs) while maintaining a high classification accuracy (95.12%). Moreover, the near-identical performance of this PCSNPs set to those of bigger data sets as well as even the entire data set. Moreover, most PCSNPs were well-matched to a set of 94 genes in the PANTHER pathway, conforming to a suggestion by the Porcine Genomic Sequencing Initiative. CONCLUSIONS: The best hybrid method truly provided a sufficiently small number of porcine SNPs that accurately classified swine breeds. |
format | Online Article Text |
id | pubmed-7251909 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-72519092020-06-07 Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique Pasupa, Kitsuchart Rathasamuth, Wanthanee Tongsima, Sissades BMC Bioinformatics Methodology Article BACKGROUND: The number of porcine Single Nucleotide Polymorphisms (SNPs) used in genetic association studies is very large, suitable for statistical testing. However, in breed classification problem, one needs to have a much smaller porcine-classifying SNPs (PCSNPs) set that could accurately classify pigs into different breeds. This study attempted to find such PCSNPs by using several combinations of feature selection and classification methods. We experimented with different combinations of feature selection methods including information gain, conventional as well as modified genetic algorithms, and our developed frequency feature selection method in combination with a common classification method, Support Vector Machine, to evaluate the method’s performance. Experiments were conducted on a comprehensive data set containing SNPs from native pigs from America, Europe, Africa, and Asia including Chinese breeds, Vietnamese breeds, and hybrid breeds from Thailand. RESULTS: The best combination of feature selection methods—information gain, modified genetic algorithm, and frequency feature selection hybrid—was able to reduce the number of possible PCSNPs to only 1.62% (164 PCSNPs) of the total number of SNPs (10,210 SNPs) while maintaining a high classification accuracy (95.12%). Moreover, the near-identical performance of this PCSNPs set to those of bigger data sets as well as even the entire data set. Moreover, most PCSNPs were well-matched to a set of 94 genes in the PANTHER pathway, conforming to a suggestion by the Porcine Genomic Sequencing Initiative. CONCLUSIONS: The best hybrid method truly provided a sufficiently small number of porcine SNPs that accurately classified swine breeds. BioMed Central 2020-05-26 /pmc/articles/PMC7251909/ /pubmed/32456608 http://dx.doi.org/10.1186/s12859-020-3471-4 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Article Pasupa, Kitsuchart Rathasamuth, Wanthanee Tongsima, Sissades Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique |
title | Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique |
title_full | Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique |
title_fullStr | Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique |
title_full_unstemmed | Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique |
title_short | Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique |
title_sort | discovery of significant porcine snps for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7251909/ https://www.ncbi.nlm.nih.gov/pubmed/32456608 http://dx.doi.org/10.1186/s12859-020-3471-4 |
work_keys_str_mv | AT pasupakitsuchart discoveryofsignificantporcinesnpsforswinebreedidentificationbyahybridofinformationgaingeneticalgorithmandfrequencyfeatureselectiontechnique AT rathasamuthwanthanee discoveryofsignificantporcinesnpsforswinebreedidentificationbyahybridofinformationgaingeneticalgorithmandfrequencyfeatureselectiontechnique AT tongsimasissades discoveryofsignificantporcinesnpsforswinebreedidentificationbyahybridofinformationgaingeneticalgorithmandfrequencyfeatureselectiontechnique |