Cargando…

Population Levels Assessment of the Distribution of Disease-Associated Variants With Emphasis on Armenians – A Machine Learning Approach

Background: During the last decades a number of genome-wide association studies (GWASs) has identified numerous single nucleotide polymorphisms (SNPs) associated with different complex diseases. However, associations reported in one population are often conflicting and did not replicate when studied...

Descripción completa

Detalles Bibliográficos
Autores principales: Nikoghosyan, Maria, Hakobyan, Siras, Hovhannisyan, Anahit, Loeffler-Wirth, Henry, Binder, Hans, Arakelyan, Arsen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6498285/
https://www.ncbi.nlm.nih.gov/pubmed/31105750
http://dx.doi.org/10.3389/fgene.2019.00394
_version_ 1783415599912714240
author Nikoghosyan, Maria
Hakobyan, Siras
Hovhannisyan, Anahit
Loeffler-Wirth, Henry
Binder, Hans
Arakelyan, Arsen
author_facet Nikoghosyan, Maria
Hakobyan, Siras
Hovhannisyan, Anahit
Loeffler-Wirth, Henry
Binder, Hans
Arakelyan, Arsen
author_sort Nikoghosyan, Maria
collection PubMed
description Background: During the last decades a number of genome-wide association studies (GWASs) has identified numerous single nucleotide polymorphisms (SNPs) associated with different complex diseases. However, associations reported in one population are often conflicting and did not replicate when studied in other populations. One of the reasons could be that most GWAS employ a case-control design in one or a limited number of populations, but little attention was paid to the global distribution of disease-associated alleles across different populations. Moreover, the majority of GWAS have been performed on selected European, African, and Chinese populations and the considerable number of populations remains understudied. Aim: We have investigated the global distribution of so far discovered disease-associated SNPs across worldwide populations of different ancestry and geographical regions with a special focus on the understudied population of Armenians. Data and Methods: We have used genotyping data from the Human Genome Diversity Project and of Armenian population and combined them with disease-associated SNP data taken from public repositories leading to a final dataset of 44,234 markers. Their frequency distribution across 1039 individuals from 53 populations was analyzed using self-organizing maps (SOM) machine learning. Our SOM portrayal approach reduces data dimensionality, clusters SNPs with similar frequency profiles and provides two-dimensional data images which enable visual evaluation of disease-associated SNPs landscapes among human populations. Results: We find that populations from Africa, Oceania, and America show specific patterns of minor allele frequencies of disease-associated SNPs, while populations from Europe, Middle East, Central South Asia, and Armenia mostly share similar patterns. Importantly, different sets of SNPs associated with common polygenic diseases, such as cancer, diabetes, neurodegeneration in populations from different geographic regions. Armenians are characterized by a set of SNPs that are distinct from other populations from the neighboring geographical regions. Conclusion: Genetic associations of diseases considerably vary across populations which necessitates health-related genotyping efforts especially for so far understudied populations. SOM portrayal represents novel promising methods in population genetic research with special strength in visualization-based comparison of SNP data.
format Online
Article
Text
id pubmed-6498285
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-64982852019-05-17 Population Levels Assessment of the Distribution of Disease-Associated Variants With Emphasis on Armenians – A Machine Learning Approach Nikoghosyan, Maria Hakobyan, Siras Hovhannisyan, Anahit Loeffler-Wirth, Henry Binder, Hans Arakelyan, Arsen Front Genet Genetics Background: During the last decades a number of genome-wide association studies (GWASs) has identified numerous single nucleotide polymorphisms (SNPs) associated with different complex diseases. However, associations reported in one population are often conflicting and did not replicate when studied in other populations. One of the reasons could be that most GWAS employ a case-control design in one or a limited number of populations, but little attention was paid to the global distribution of disease-associated alleles across different populations. Moreover, the majority of GWAS have been performed on selected European, African, and Chinese populations and the considerable number of populations remains understudied. Aim: We have investigated the global distribution of so far discovered disease-associated SNPs across worldwide populations of different ancestry and geographical regions with a special focus on the understudied population of Armenians. Data and Methods: We have used genotyping data from the Human Genome Diversity Project and of Armenian population and combined them with disease-associated SNP data taken from public repositories leading to a final dataset of 44,234 markers. Their frequency distribution across 1039 individuals from 53 populations was analyzed using self-organizing maps (SOM) machine learning. Our SOM portrayal approach reduces data dimensionality, clusters SNPs with similar frequency profiles and provides two-dimensional data images which enable visual evaluation of disease-associated SNPs landscapes among human populations. Results: We find that populations from Africa, Oceania, and America show specific patterns of minor allele frequencies of disease-associated SNPs, while populations from Europe, Middle East, Central South Asia, and Armenia mostly share similar patterns. Importantly, different sets of SNPs associated with common polygenic diseases, such as cancer, diabetes, neurodegeneration in populations from different geographic regions. Armenians are characterized by a set of SNPs that are distinct from other populations from the neighboring geographical regions. Conclusion: Genetic associations of diseases considerably vary across populations which necessitates health-related genotyping efforts especially for so far understudied populations. SOM portrayal represents novel promising methods in population genetic research with special strength in visualization-based comparison of SNP data. Frontiers Media S.A. 2019-04-26 /pmc/articles/PMC6498285/ /pubmed/31105750 http://dx.doi.org/10.3389/fgene.2019.00394 Text en Copyright © 2019 Nikoghosyan, Hakobyan, Hovhannisyan, Loeffler-Wirth, Binder and Arakelyan. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Nikoghosyan, Maria
Hakobyan, Siras
Hovhannisyan, Anahit
Loeffler-Wirth, Henry
Binder, Hans
Arakelyan, Arsen
Population Levels Assessment of the Distribution of Disease-Associated Variants With Emphasis on Armenians – A Machine Learning Approach
title Population Levels Assessment of the Distribution of Disease-Associated Variants With Emphasis on Armenians – A Machine Learning Approach
title_full Population Levels Assessment of the Distribution of Disease-Associated Variants With Emphasis on Armenians – A Machine Learning Approach
title_fullStr Population Levels Assessment of the Distribution of Disease-Associated Variants With Emphasis on Armenians – A Machine Learning Approach
title_full_unstemmed Population Levels Assessment of the Distribution of Disease-Associated Variants With Emphasis on Armenians – A Machine Learning Approach
title_short Population Levels Assessment of the Distribution of Disease-Associated Variants With Emphasis on Armenians – A Machine Learning Approach
title_sort population levels assessment of the distribution of disease-associated variants with emphasis on armenians – a machine learning approach
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6498285/
https://www.ncbi.nlm.nih.gov/pubmed/31105750
http://dx.doi.org/10.3389/fgene.2019.00394
work_keys_str_mv AT nikoghosyanmaria populationlevelsassessmentofthedistributionofdiseaseassociatedvariantswithemphasisonarmeniansamachinelearningapproach
AT hakobyansiras populationlevelsassessmentofthedistributionofdiseaseassociatedvariantswithemphasisonarmeniansamachinelearningapproach
AT hovhannisyananahit populationlevelsassessmentofthedistributionofdiseaseassociatedvariantswithemphasisonarmeniansamachinelearningapproach
AT loefflerwirthhenry populationlevelsassessmentofthedistributionofdiseaseassociatedvariantswithemphasisonarmeniansamachinelearningapproach
AT binderhans populationlevelsassessmentofthedistributionofdiseaseassociatedvariantswithemphasisonarmeniansamachinelearningapproach
AT arakelyanarsen populationlevelsassessmentofthedistributionofdiseaseassociatedvariantswithemphasisonarmeniansamachinelearningapproach