Cargando…

Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers

Currently available genetic tools effectively distinguish between different continental origins. However, North Eurasia, which constitutes one-third of the world’s largest continent, remains severely underrepresented. The dataset used in this study represents 266 populations from 12 North Eurasian c...

Descripción completa

Detalles Bibliográficos
Autores principales: Gorin, Igor, Balanovsky, Oleg, Kozlov, Oleg, Koshel, Sergey, Kostryukova, Elena, Zhabagin, Maxat, Agdzhoyan, Anastasiya, Pylev, Vladimir, Balanovska, Elena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9149316/
https://www.ncbi.nlm.nih.gov/pubmed/35651934
http://dx.doi.org/10.3389/fgene.2022.902309
_version_ 1784717184754253824
author Gorin, Igor
Balanovsky, Oleg
Kozlov, Oleg
Koshel, Sergey
Kostryukova, Elena
Zhabagin, Maxat
Agdzhoyan, Anastasiya
Pylev, Vladimir
Balanovska, Elena
author_facet Gorin, Igor
Balanovsky, Oleg
Kozlov, Oleg
Koshel, Sergey
Kostryukova, Elena
Zhabagin, Maxat
Agdzhoyan, Anastasiya
Pylev, Vladimir
Balanovska, Elena
author_sort Gorin, Igor
collection PubMed
description Currently available genetic tools effectively distinguish between different continental origins. However, North Eurasia, which constitutes one-third of the world’s largest continent, remains severely underrepresented. The dataset used in this study represents 266 populations from 12 North Eurasian countries, including most of the ethnic diversity across Russia’s vast territory. A total of 1,883 samples were genotyped using the Illumina Infinium Omni5Exome-4 v1.3 BeadChip. Three principal components were computed for the entire dataset using three iterations for outlier removal. It allowed the merging of 266 populations into larger groups while maintaining intragroup homogeneity, so 29 ethnic geographic groups were formed that were genetically distinguishable enough to trace individual ancestry. Several feature selection methods, including the random forest algorithm, were tested to estimate the number of genetic markers needed to differentiate between the groups; 5,229 ancestry-informative SNPs were selected. We tested various classifiers supporting multiple classes and output values for each class that could be interpreted as probabilities. The logistic regression was chosen as the best mathematical model for predicting ancestral populations. The machine learning algorithm for inferring an ancestral ethnic geographic group was implemented in the original software “Homeland” fitted with the interface module, the prediction module, and the cartographic module. Examples of geographic maps showing the likelihood of geographic ancestry for individuals from different regions of North Eurasia are provided. Validating methods show that the highest number of ethnic geographic group predictions with almost absolute accuracy and sensitivity was observed for South and Central Siberia, Far East, and Kamchatka. The total accuracy of prediction of one of 29 ethnic geographic groups reached 71%. The proposed method can be employed to predict ancestries from the populations of Russia and its neighbor states. It can be used for the needs of forensic science and genetic genealogy.
format Online
Article
Text
id pubmed-9149316
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-91493162022-05-31 Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers Gorin, Igor Balanovsky, Oleg Kozlov, Oleg Koshel, Sergey Kostryukova, Elena Zhabagin, Maxat Agdzhoyan, Anastasiya Pylev, Vladimir Balanovska, Elena Front Genet Genetics Currently available genetic tools effectively distinguish between different continental origins. However, North Eurasia, which constitutes one-third of the world’s largest continent, remains severely underrepresented. The dataset used in this study represents 266 populations from 12 North Eurasian countries, including most of the ethnic diversity across Russia’s vast territory. A total of 1,883 samples were genotyped using the Illumina Infinium Omni5Exome-4 v1.3 BeadChip. Three principal components were computed for the entire dataset using three iterations for outlier removal. It allowed the merging of 266 populations into larger groups while maintaining intragroup homogeneity, so 29 ethnic geographic groups were formed that were genetically distinguishable enough to trace individual ancestry. Several feature selection methods, including the random forest algorithm, were tested to estimate the number of genetic markers needed to differentiate between the groups; 5,229 ancestry-informative SNPs were selected. We tested various classifiers supporting multiple classes and output values for each class that could be interpreted as probabilities. The logistic regression was chosen as the best mathematical model for predicting ancestral populations. The machine learning algorithm for inferring an ancestral ethnic geographic group was implemented in the original software “Homeland” fitted with the interface module, the prediction module, and the cartographic module. Examples of geographic maps showing the likelihood of geographic ancestry for individuals from different regions of North Eurasia are provided. Validating methods show that the highest number of ethnic geographic group predictions with almost absolute accuracy and sensitivity was observed for South and Central Siberia, Far East, and Kamchatka. The total accuracy of prediction of one of 29 ethnic geographic groups reached 71%. The proposed method can be employed to predict ancestries from the populations of Russia and its neighbor states. It can be used for the needs of forensic science and genetic genealogy. Frontiers Media S.A. 2022-05-16 /pmc/articles/PMC9149316/ /pubmed/35651934 http://dx.doi.org/10.3389/fgene.2022.902309 Text en Copyright © 2022 Gorin, Balanovsky, Kozlov, Koshel, Kostryukova, Zhabagin, Agdzhoyan, Pylev and Balanovska. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Gorin, Igor
Balanovsky, Oleg
Kozlov, Oleg
Koshel, Sergey
Kostryukova, Elena
Zhabagin, Maxat
Agdzhoyan, Anastasiya
Pylev, Vladimir
Balanovska, Elena
Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers
title Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers
title_full Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers
title_fullStr Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers
title_full_unstemmed Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers
title_short Determining the Area of Ancestral Origin for Individuals From North Eurasia Based on 5,229 SNP Markers
title_sort determining the area of ancestral origin for individuals from north eurasia based on 5,229 snp markers
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9149316/
https://www.ncbi.nlm.nih.gov/pubmed/35651934
http://dx.doi.org/10.3389/fgene.2022.902309
work_keys_str_mv AT gorinigor determiningtheareaofancestraloriginforindividualsfromnortheurasiabasedon5229snpmarkers
AT balanovskyoleg determiningtheareaofancestraloriginforindividualsfromnortheurasiabasedon5229snpmarkers
AT kozlovoleg determiningtheareaofancestraloriginforindividualsfromnortheurasiabasedon5229snpmarkers
AT koshelsergey determiningtheareaofancestraloriginforindividualsfromnortheurasiabasedon5229snpmarkers
AT kostryukovaelena determiningtheareaofancestraloriginforindividualsfromnortheurasiabasedon5229snpmarkers
AT zhabaginmaxat determiningtheareaofancestraloriginforindividualsfromnortheurasiabasedon5229snpmarkers
AT agdzhoyananastasiya determiningtheareaofancestraloriginforindividualsfromnortheurasiabasedon5229snpmarkers
AT pylevvladimir determiningtheareaofancestraloriginforindividualsfromnortheurasiabasedon5229snpmarkers
AT balanovskaelena determiningtheareaofancestraloriginforindividualsfromnortheurasiabasedon5229snpmarkers