Cargando…

A Modified T-test Feature Selection Method and Its Application on the HapMap Genotype Data

Single nucleotide polymorphisms (SNPs) are genetic variations that determine the differences between any two unrelated individuals. Various population groups can be distinguished from each other using SNPs. For instance, the HapMap dataset has four population groups with about ten million SNPs. For...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Nina, Wang, Lipo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5054219/
https://www.ncbi.nlm.nih.gov/pubmed/18267305
http://dx.doi.org/10.1016/S1672-0229(08)60011-X
_version_ 1782458552775868416
author Zhou, Nina
Wang, Lipo
author_facet Zhou, Nina
Wang, Lipo
author_sort Zhou, Nina
collection PubMed
description Single nucleotide polymorphisms (SNPs) are genetic variations that determine the differences between any two unrelated individuals. Various population groups can be distinguished from each other using SNPs. For instance, the HapMap dataset has four population groups with about ten million SNPs. For more insights on human evolution, ethnic variation, and population assignment, we propose to find out which SNPs are significant in determining the population groups and then to classify different populations using these relevant SNPs as input features. In this study, we developed a modified t-test ranking measure and applied it to the HapMap genotype data. Firstly, we rank all SNPs in comparison with other feature importance measures including F-statistics and the informativeness for assignment. Secondly, we select different numbers of the most highly ranked SNPs as the input to a classifier, such as the support vector machine, so as to find the best feature subset corresponding to the best classification accuracy. Experimental results showed that the proposed method is very effective in finding SNPs that are significant in determining the population groups, with reduced computational burden and better classification accuracy.
format Online
Article
Text
id pubmed-5054219
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-50542192016-10-14 A Modified T-test Feature Selection Method and Its Application on the HapMap Genotype Data Zhou, Nina Wang, Lipo Genomics Proteomics Bioinformatics Method Single nucleotide polymorphisms (SNPs) are genetic variations that determine the differences between any two unrelated individuals. Various population groups can be distinguished from each other using SNPs. For instance, the HapMap dataset has four population groups with about ten million SNPs. For more insights on human evolution, ethnic variation, and population assignment, we propose to find out which SNPs are significant in determining the population groups and then to classify different populations using these relevant SNPs as input features. In this study, we developed a modified t-test ranking measure and applied it to the HapMap genotype data. Firstly, we rank all SNPs in comparison with other feature importance measures including F-statistics and the informativeness for assignment. Secondly, we select different numbers of the most highly ranked SNPs as the input to a classifier, such as the support vector machine, so as to find the best feature subset corresponding to the best classification accuracy. Experimental results showed that the proposed method is very effective in finding SNPs that are significant in determining the population groups, with reduced computational burden and better classification accuracy. Elsevier 2007 2008-02-08 /pmc/articles/PMC5054219/ /pubmed/18267305 http://dx.doi.org/10.1016/S1672-0229(08)60011-X Text en © 2007 Beijing Institute of Genomics http://creativecommons.org/licenses/by-nc-sa/3.0/ This is an open access article under the CC BY-NC-SA license (http://creativecommons.org/licenses/by-nc-sa/3.0/).
spellingShingle Method
Zhou, Nina
Wang, Lipo
A Modified T-test Feature Selection Method and Its Application on the HapMap Genotype Data
title A Modified T-test Feature Selection Method and Its Application on the HapMap Genotype Data
title_full A Modified T-test Feature Selection Method and Its Application on the HapMap Genotype Data
title_fullStr A Modified T-test Feature Selection Method and Its Application on the HapMap Genotype Data
title_full_unstemmed A Modified T-test Feature Selection Method and Its Application on the HapMap Genotype Data
title_short A Modified T-test Feature Selection Method and Its Application on the HapMap Genotype Data
title_sort modified t-test feature selection method and its application on the hapmap genotype data
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5054219/
https://www.ncbi.nlm.nih.gov/pubmed/18267305
http://dx.doi.org/10.1016/S1672-0229(08)60011-X
work_keys_str_mv AT zhounina amodifiedttestfeatureselectionmethodanditsapplicationonthehapmapgenotypedata
AT wanglipo amodifiedttestfeatureselectionmethodanditsapplicationonthehapmapgenotypedata
AT zhounina modifiedttestfeatureselectionmethodanditsapplicationonthehapmapgenotypedata
AT wanglipo modifiedttestfeatureselectionmethodanditsapplicationonthehapmapgenotypedata