Cargando…

A Deep Learning Approach to Population Structure Inference in Inbred Lines of Maize

Analysis of population genetic variation and structure is a common practice for genome-wide studies, including association mapping, ecology, and evolution studies in several crop species. In this study, machine learning (ML) clustering methods, K-means (KM), and hierarchical clustering (HC), in comb...

Descripción completa

Detalles Bibliográficos
Autores principales: López-Cortés, Xaviera Alejandra, Matamala, Felipe, Maldonado, Carlos, Mora-Poblete, Freddy, Scapim, Carlos Alberto
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7732446/
https://www.ncbi.nlm.nih.gov/pubmed/33329691
http://dx.doi.org/10.3389/fgene.2020.543459
_version_ 1783622096303161344
author López-Cortés, Xaviera Alejandra
Matamala, Felipe
Maldonado, Carlos
Mora-Poblete, Freddy
Scapim, Carlos Alberto
author_facet López-Cortés, Xaviera Alejandra
Matamala, Felipe
Maldonado, Carlos
Mora-Poblete, Freddy
Scapim, Carlos Alberto
author_sort López-Cortés, Xaviera Alejandra
collection PubMed
description Analysis of population genetic variation and structure is a common practice for genome-wide studies, including association mapping, ecology, and evolution studies in several crop species. In this study, machine learning (ML) clustering methods, K-means (KM), and hierarchical clustering (HC), in combination with non-linear and linear dimensionality reduction techniques, deep autoencoder (DeepAE) and principal component analysis (PCA), were used to infer population structure and individual assignment of maize inbred lines, i.e., dent field corn (n = 97) and popcorn (n = 86). The results revealed that the HC method in combination with DeepAE-based data preprocessing (DeepAE-HC) was the most effective method to assign individuals to clusters (with 96% of correct individual assignments), whereas DeepAE-KM, PCA-HC, and PCA-KM were assigned correctly 92, 89, and 81% of the lines, respectively. These findings were consistent with both Silhouette Coefficient (SC) and Davies–Bouldin validation indexes. Notably, DeepAE-HC also had better accuracy than the Bayesian clustering method implemented in InStruct. The results of this study showed that deep learning (DL)-based dimensional reduction combined with ML clustering methods is a useful tool to determine genetically differentiated groups and to assign individuals into subpopulations in genome-wide studies without having to consider previous genetic assumptions.
format Online
Article
Text
id pubmed-7732446
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-77324462020-12-15 A Deep Learning Approach to Population Structure Inference in Inbred Lines of Maize López-Cortés, Xaviera Alejandra Matamala, Felipe Maldonado, Carlos Mora-Poblete, Freddy Scapim, Carlos Alberto Front Genet Genetics Analysis of population genetic variation and structure is a common practice for genome-wide studies, including association mapping, ecology, and evolution studies in several crop species. In this study, machine learning (ML) clustering methods, K-means (KM), and hierarchical clustering (HC), in combination with non-linear and linear dimensionality reduction techniques, deep autoencoder (DeepAE) and principal component analysis (PCA), were used to infer population structure and individual assignment of maize inbred lines, i.e., dent field corn (n = 97) and popcorn (n = 86). The results revealed that the HC method in combination with DeepAE-based data preprocessing (DeepAE-HC) was the most effective method to assign individuals to clusters (with 96% of correct individual assignments), whereas DeepAE-KM, PCA-HC, and PCA-KM were assigned correctly 92, 89, and 81% of the lines, respectively. These findings were consistent with both Silhouette Coefficient (SC) and Davies–Bouldin validation indexes. Notably, DeepAE-HC also had better accuracy than the Bayesian clustering method implemented in InStruct. The results of this study showed that deep learning (DL)-based dimensional reduction combined with ML clustering methods is a useful tool to determine genetically differentiated groups and to assign individuals into subpopulations in genome-wide studies without having to consider previous genetic assumptions. Frontiers Media S.A. 2020-11-24 /pmc/articles/PMC7732446/ /pubmed/33329691 http://dx.doi.org/10.3389/fgene.2020.543459 Text en Copyright © 2020 López-Cortés, Matamala, Maldonado, Mora-Poblete and Scapim. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
López-Cortés, Xaviera Alejandra
Matamala, Felipe
Maldonado, Carlos
Mora-Poblete, Freddy
Scapim, Carlos Alberto
A Deep Learning Approach to Population Structure Inference in Inbred Lines of Maize
title A Deep Learning Approach to Population Structure Inference in Inbred Lines of Maize
title_full A Deep Learning Approach to Population Structure Inference in Inbred Lines of Maize
title_fullStr A Deep Learning Approach to Population Structure Inference in Inbred Lines of Maize
title_full_unstemmed A Deep Learning Approach to Population Structure Inference in Inbred Lines of Maize
title_short A Deep Learning Approach to Population Structure Inference in Inbred Lines of Maize
title_sort deep learning approach to population structure inference in inbred lines of maize
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7732446/
https://www.ncbi.nlm.nih.gov/pubmed/33329691
http://dx.doi.org/10.3389/fgene.2020.543459
work_keys_str_mv AT lopezcortesxavieraalejandra adeeplearningapproachtopopulationstructureinferenceininbredlinesofmaize
AT matamalafelipe adeeplearningapproachtopopulationstructureinferenceininbredlinesofmaize
AT maldonadocarlos adeeplearningapproachtopopulationstructureinferenceininbredlinesofmaize
AT morapobletefreddy adeeplearningapproachtopopulationstructureinferenceininbredlinesofmaize
AT scapimcarlosalberto adeeplearningapproachtopopulationstructureinferenceininbredlinesofmaize
AT lopezcortesxavieraalejandra deeplearningapproachtopopulationstructureinferenceininbredlinesofmaize
AT matamalafelipe deeplearningapproachtopopulationstructureinferenceininbredlinesofmaize
AT maldonadocarlos deeplearningapproachtopopulationstructureinferenceininbredlinesofmaize
AT morapobletefreddy deeplearningapproachtopopulationstructureinferenceininbredlinesofmaize
AT scapimcarlosalberto deeplearningapproachtopopulationstructureinferenceininbredlinesofmaize