Cargando…

Genetic Classification of Populations Using Supervised Learning

There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case–control studies and quality control (when participants in a study ha...

Descripción completa

Detalles Bibliográficos
Autores principales: Bridges, Michael, Heron, Elizabeth A., O'Dushlaine, Colm, Segurado, Ricardo, Morris, Derek, Corvin, Aiden, Gill, Michael, Pinto, Carlos
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3093382/
https://www.ncbi.nlm.nih.gov/pubmed/21589856
http://dx.doi.org/10.1371/journal.pone.0014802
_version_ 1782203468859047936
author Bridges, Michael
Heron, Elizabeth A.
O'Dushlaine, Colm
Segurado, Ricardo
Morris, Derek
Corvin, Aiden
Gill, Michael
Pinto, Carlos
author_facet Bridges, Michael
Heron, Elizabeth A.
O'Dushlaine, Colm
Segurado, Ricardo
Morris, Derek
Corvin, Aiden
Gill, Michael
Pinto, Carlos
author_sort Bridges, Michael
collection PubMed
description There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case–control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available. In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.
format Text
id pubmed-3093382
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-30933822011-05-17 Genetic Classification of Populations Using Supervised Learning Bridges, Michael Heron, Elizabeth A. O'Dushlaine, Colm Segurado, Ricardo Morris, Derek Corvin, Aiden Gill, Michael Pinto, Carlos PLoS One Research Article There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case–control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available. In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies. Public Library of Science 2011-05-12 /pmc/articles/PMC3093382/ /pubmed/21589856 http://dx.doi.org/10.1371/journal.pone.0014802 Text en Bridges et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Bridges, Michael
Heron, Elizabeth A.
O'Dushlaine, Colm
Segurado, Ricardo
Morris, Derek
Corvin, Aiden
Gill, Michael
Pinto, Carlos
Genetic Classification of Populations Using Supervised Learning
title Genetic Classification of Populations Using Supervised Learning
title_full Genetic Classification of Populations Using Supervised Learning
title_fullStr Genetic Classification of Populations Using Supervised Learning
title_full_unstemmed Genetic Classification of Populations Using Supervised Learning
title_short Genetic Classification of Populations Using Supervised Learning
title_sort genetic classification of populations using supervised learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3093382/
https://www.ncbi.nlm.nih.gov/pubmed/21589856
http://dx.doi.org/10.1371/journal.pone.0014802
work_keys_str_mv AT bridgesmichael geneticclassificationofpopulationsusingsupervisedlearning
AT heronelizabetha geneticclassificationofpopulationsusingsupervisedlearning
AT odushlainecolm geneticclassificationofpopulationsusingsupervisedlearning
AT seguradoricardo geneticclassificationofpopulationsusingsupervisedlearning
AT geneticclassificationofpopulationsusingsupervisedlearning
AT morrisderek geneticclassificationofpopulationsusingsupervisedlearning
AT corvinaiden geneticclassificationofpopulationsusingsupervisedlearning
AT gillmichael geneticclassificationofpopulationsusingsupervisedlearning
AT pintocarlos geneticclassificationofpopulationsusingsupervisedlearning