Cargando…
CONE: Community Oriented Network Estimation Is a Versatile Framework for Inferring Population Structure in Large-Scale Sequencing Data
Estimation of genetic population structure based on molecular markers is a common task in population genetics and ecology. We apply a generalized linear model with LASSO regularization to infer relationships between individuals and populations from molecular marker data. Specifically, we apply a nei...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Genetics Society of America
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5633386/ https://www.ncbi.nlm.nih.gov/pubmed/28830924 http://dx.doi.org/10.1534/g3.117.300131 |
_version_ | 1783269883926020096 |
---|---|
author | Kuismin, Markku O. Ahlinder, Jon Sillanpӓӓ, Mikko J. |
author_facet | Kuismin, Markku O. Ahlinder, Jon Sillanpӓӓ, Mikko J. |
author_sort | Kuismin, Markku O. |
collection | PubMed |
description | Estimation of genetic population structure based on molecular markers is a common task in population genetics and ecology. We apply a generalized linear model with LASSO regularization to infer relationships between individuals and populations from molecular marker data. Specifically, we apply a neighborhood selection algorithm to infer population genetic structure and gene flow between populations. The resulting relationships are used to construct an individual-level population graph. Different network substructures known as communities are then dissociated from each other using a community detection algorithm. Inference of population structure using networks combines the good properties of: (i) network theory (broad collection of tools, including aesthetically pleasing visualization), (ii) principal component analysis (dimension reduction together with simple visual inspection), and (iii) model-based methods (e.g., ancestry coefficient estimates). We have named our process CONE (for community oriented network estimation). CONE has fewer restrictions than conventional assignment methods in that properties such as the number of subpopulations need not be fixed before the analysis and the sample may include close relatives or involve uneven sampling. Applying CONE on simulated data sets resulted in more accurate estimates of the true number of subpopulations than model-based methods, and provided comparable ancestry coefficient estimates. Inference of empirical data sets of teosinte single nucleotide polymorphism, bacterial disease outbreak, and the human genome diversity panel illustrate that population structures estimated with CONE are consistent with the earlier findings |
format | Online Article Text |
id | pubmed-5633386 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Genetics Society of America |
record_format | MEDLINE/PubMed |
spelling | pubmed-56333862017-10-18 CONE: Community Oriented Network Estimation Is a Versatile Framework for Inferring Population Structure in Large-Scale Sequencing Data Kuismin, Markku O. Ahlinder, Jon Sillanpӓӓ, Mikko J. G3 (Bethesda) Investigations Estimation of genetic population structure based on molecular markers is a common task in population genetics and ecology. We apply a generalized linear model with LASSO regularization to infer relationships between individuals and populations from molecular marker data. Specifically, we apply a neighborhood selection algorithm to infer population genetic structure and gene flow between populations. The resulting relationships are used to construct an individual-level population graph. Different network substructures known as communities are then dissociated from each other using a community detection algorithm. Inference of population structure using networks combines the good properties of: (i) network theory (broad collection of tools, including aesthetically pleasing visualization), (ii) principal component analysis (dimension reduction together with simple visual inspection), and (iii) model-based methods (e.g., ancestry coefficient estimates). We have named our process CONE (for community oriented network estimation). CONE has fewer restrictions than conventional assignment methods in that properties such as the number of subpopulations need not be fixed before the analysis and the sample may include close relatives or involve uneven sampling. Applying CONE on simulated data sets resulted in more accurate estimates of the true number of subpopulations than model-based methods, and provided comparable ancestry coefficient estimates. Inference of empirical data sets of teosinte single nucleotide polymorphism, bacterial disease outbreak, and the human genome diversity panel illustrate that population structures estimated with CONE are consistent with the earlier findings Genetics Society of America 2017-08-22 /pmc/articles/PMC5633386/ /pubmed/28830924 http://dx.doi.org/10.1534/g3.117.300131 Text en Copyright © 2017 Kuismin et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Investigations Kuismin, Markku O. Ahlinder, Jon Sillanpӓӓ, Mikko J. CONE: Community Oriented Network Estimation Is a Versatile Framework for Inferring Population Structure in Large-Scale Sequencing Data |
title | CONE: Community Oriented Network Estimation Is a Versatile Framework for Inferring Population Structure in Large-Scale Sequencing Data |
title_full | CONE: Community Oriented Network Estimation Is a Versatile Framework for Inferring Population Structure in Large-Scale Sequencing Data |
title_fullStr | CONE: Community Oriented Network Estimation Is a Versatile Framework for Inferring Population Structure in Large-Scale Sequencing Data |
title_full_unstemmed | CONE: Community Oriented Network Estimation Is a Versatile Framework for Inferring Population Structure in Large-Scale Sequencing Data |
title_short | CONE: Community Oriented Network Estimation Is a Versatile Framework for Inferring Population Structure in Large-Scale Sequencing Data |
title_sort | cone: community oriented network estimation is a versatile framework for inferring population structure in large-scale sequencing data |
topic | Investigations |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5633386/ https://www.ncbi.nlm.nih.gov/pubmed/28830924 http://dx.doi.org/10.1534/g3.117.300131 |
work_keys_str_mv | AT kuisminmarkkuo conecommunityorientednetworkestimationisaversatileframeworkforinferringpopulationstructureinlargescalesequencingdata AT ahlinderjon conecommunityorientednetworkestimationisaversatileframeworkforinferringpopulationstructureinlargescalesequencingdata AT sillanpäämikkoj conecommunityorientednetworkestimationisaversatileframeworkforinferringpopulationstructureinlargescalesequencingdata |