Cargando…

The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus

Background: Genetic diversity provides the basic substrate for evolution. Genetic variation consists of changes ranging from single base pairs (single-nucleotide polymorphisms, or SNPs) to larger-scale structural variants, such as inversions, deletions, and duplications. SNPs have long been used as...

Descripción completa

Detalles Bibliográficos
Autores principales: Ruigrok, Mike, Xue, Bing, Catanach, Andrew, Zhang, Mengjie, Jesson, Linley, Davy, Marcus, Wellenreuther, Maren
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9320665/
https://www.ncbi.nlm.nih.gov/pubmed/35885912
http://dx.doi.org/10.3390/genes13071129
_version_ 1784755847234060288
author Ruigrok, Mike
Xue, Bing
Catanach, Andrew
Zhang, Mengjie
Jesson, Linley
Davy, Marcus
Wellenreuther, Maren
author_facet Ruigrok, Mike
Xue, Bing
Catanach, Andrew
Zhang, Mengjie
Jesson, Linley
Davy, Marcus
Wellenreuther, Maren
author_sort Ruigrok, Mike
collection PubMed
description Background: Genetic diversity provides the basic substrate for evolution. Genetic variation consists of changes ranging from single base pairs (single-nucleotide polymorphisms, or SNPs) to larger-scale structural variants, such as inversions, deletions, and duplications. SNPs have long been used as the general currency for investigations into how genetic diversity fuels evolution. However, structural variants can affect more base pairs in the genome than SNPs and can be responsible for adaptive phenotypes due to their impact on linkage and recombination. In this study, we investigate the first steps needed to explore the genetic basis of an economically important growth trait in the marine teleost finfish Chrysophrys auratus using both SNP and structural variant data. Specifically, we use feature selection methods in machine learning to explore the relative predictive power of both types of genetic variants in explaining growth and discuss the feature selection results of the evaluated methods. Methods: SNP and structural variant callers were used to generate catalogues of variant data from 32 individual fish at ages 1 and 3 years. Three feature selection algorithms (ReliefF, Chi-square, and a mutual-information-based method) were used to reduce the dataset by selecting the most informative features. Following this selection process, the subset of variants was used as features to classify fish into small, medium, or large size categories using KNN, naïve Bayes, random forest, and logistic regression. The top-scoring features in each feature selection method were subsequently mapped to annotated genomic regions in the zebrafish genome, and a permutation test was conducted to see if the number of mapped regions was greater than when random sampling was applied. Results: Without feature selection, the prediction accuracies ranged from 0 to 0.5 for both structural variants and SNPs. Following feature selection, the prediction accuracy increased only slightly to between 0 and 0.65 for structural variants and between 0 and 0.75 for SNPs. The highest prediction accuracy for the logistic regression was achieved for age 3 fish using SNPs, although generally predictions for age 1 and 3 fish were very similar (ranging from 0–0.65 for both SNPs and structural variants). The Chi-square feature selection of SNP data was the only method that had a significantly higher number of matches to annotated genomic regions of zebrafish than would be explained by chance alone. Conclusions: Predicting a complex polygenic trait such as growth using data collected from a low number of individuals remains challenging. While we demonstrate that both SNPs and structural variants provide important information to help understand the genetic basis of phenotypic traits such as fish growth, the full complexities that exist within a genome cannot be easily captured by classical machine learning techniques. When using high-dimensional data, feature selection shows some increase in the prediction accuracy of classification models and provides the potential to identify unknown genomic correlates with growth. Our results show that both SNPs and structural variants significantly impact growth, and we therefore recommend that researchers interested in the genotype–phenotype map should strive to go beyond SNPs and incorporate structural variants in their studies as well. We discuss how our machine learning models can be further expanded to serve as a test bed to inform evolutionary studies and the applied management of species.
format Online
Article
Text
id pubmed-9320665
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93206652022-07-27 The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus Ruigrok, Mike Xue, Bing Catanach, Andrew Zhang, Mengjie Jesson, Linley Davy, Marcus Wellenreuther, Maren Genes (Basel) Article Background: Genetic diversity provides the basic substrate for evolution. Genetic variation consists of changes ranging from single base pairs (single-nucleotide polymorphisms, or SNPs) to larger-scale structural variants, such as inversions, deletions, and duplications. SNPs have long been used as the general currency for investigations into how genetic diversity fuels evolution. However, structural variants can affect more base pairs in the genome than SNPs and can be responsible for adaptive phenotypes due to their impact on linkage and recombination. In this study, we investigate the first steps needed to explore the genetic basis of an economically important growth trait in the marine teleost finfish Chrysophrys auratus using both SNP and structural variant data. Specifically, we use feature selection methods in machine learning to explore the relative predictive power of both types of genetic variants in explaining growth and discuss the feature selection results of the evaluated methods. Methods: SNP and structural variant callers were used to generate catalogues of variant data from 32 individual fish at ages 1 and 3 years. Three feature selection algorithms (ReliefF, Chi-square, and a mutual-information-based method) were used to reduce the dataset by selecting the most informative features. Following this selection process, the subset of variants was used as features to classify fish into small, medium, or large size categories using KNN, naïve Bayes, random forest, and logistic regression. The top-scoring features in each feature selection method were subsequently mapped to annotated genomic regions in the zebrafish genome, and a permutation test was conducted to see if the number of mapped regions was greater than when random sampling was applied. Results: Without feature selection, the prediction accuracies ranged from 0 to 0.5 for both structural variants and SNPs. Following feature selection, the prediction accuracy increased only slightly to between 0 and 0.65 for structural variants and between 0 and 0.75 for SNPs. The highest prediction accuracy for the logistic regression was achieved for age 3 fish using SNPs, although generally predictions for age 1 and 3 fish were very similar (ranging from 0–0.65 for both SNPs and structural variants). The Chi-square feature selection of SNP data was the only method that had a significantly higher number of matches to annotated genomic regions of zebrafish than would be explained by chance alone. Conclusions: Predicting a complex polygenic trait such as growth using data collected from a low number of individuals remains challenging. While we demonstrate that both SNPs and structural variants provide important information to help understand the genetic basis of phenotypic traits such as fish growth, the full complexities that exist within a genome cannot be easily captured by classical machine learning techniques. When using high-dimensional data, feature selection shows some increase in the prediction accuracy of classification models and provides the potential to identify unknown genomic correlates with growth. Our results show that both SNPs and structural variants significantly impact growth, and we therefore recommend that researchers interested in the genotype–phenotype map should strive to go beyond SNPs and incorporate structural variants in their studies as well. We discuss how our machine learning models can be further expanded to serve as a test bed to inform evolutionary studies and the applied management of species. MDPI 2022-06-23 /pmc/articles/PMC9320665/ /pubmed/35885912 http://dx.doi.org/10.3390/genes13071129 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ruigrok, Mike
Xue, Bing
Catanach, Andrew
Zhang, Mengjie
Jesson, Linley
Davy, Marcus
Wellenreuther, Maren
The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus
title The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus
title_full The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus
title_fullStr The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus
title_full_unstemmed The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus
title_short The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus
title_sort relative power of structural genomic variation versus snps in explaining the quantitative trait growth in the marine teleost chrysophrys auratus
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9320665/
https://www.ncbi.nlm.nih.gov/pubmed/35885912
http://dx.doi.org/10.3390/genes13071129
work_keys_str_mv AT ruigrokmike therelativepowerofstructuralgenomicvariationversussnpsinexplainingthequantitativetraitgrowthinthemarineteleostchrysophrysauratus
AT xuebing therelativepowerofstructuralgenomicvariationversussnpsinexplainingthequantitativetraitgrowthinthemarineteleostchrysophrysauratus
AT catanachandrew therelativepowerofstructuralgenomicvariationversussnpsinexplainingthequantitativetraitgrowthinthemarineteleostchrysophrysauratus
AT zhangmengjie therelativepowerofstructuralgenomicvariationversussnpsinexplainingthequantitativetraitgrowthinthemarineteleostchrysophrysauratus
AT jessonlinley therelativepowerofstructuralgenomicvariationversussnpsinexplainingthequantitativetraitgrowthinthemarineteleostchrysophrysauratus
AT davymarcus therelativepowerofstructuralgenomicvariationversussnpsinexplainingthequantitativetraitgrowthinthemarineteleostchrysophrysauratus
AT wellenreuthermaren therelativepowerofstructuralgenomicvariationversussnpsinexplainingthequantitativetraitgrowthinthemarineteleostchrysophrysauratus
AT ruigrokmike relativepowerofstructuralgenomicvariationversussnpsinexplainingthequantitativetraitgrowthinthemarineteleostchrysophrysauratus
AT xuebing relativepowerofstructuralgenomicvariationversussnpsinexplainingthequantitativetraitgrowthinthemarineteleostchrysophrysauratus
AT catanachandrew relativepowerofstructuralgenomicvariationversussnpsinexplainingthequantitativetraitgrowthinthemarineteleostchrysophrysauratus
AT zhangmengjie relativepowerofstructuralgenomicvariationversussnpsinexplainingthequantitativetraitgrowthinthemarineteleostchrysophrysauratus
AT jessonlinley relativepowerofstructuralgenomicvariationversussnpsinexplainingthequantitativetraitgrowthinthemarineteleostchrysophrysauratus
AT davymarcus relativepowerofstructuralgenomicvariationversussnpsinexplainingthequantitativetraitgrowthinthemarineteleostchrysophrysauratus
AT wellenreuthermaren relativepowerofstructuralgenomicvariationversussnpsinexplainingthequantitativetraitgrowthinthemarineteleostchrysophrysauratus