Cargando…
Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype–phenotype interactions
Motivation: A central goal of current biology is to establish a complete functional link between the genotype and phenotype, known as the so-called genotype–phenotype map. With the continuous development of high-throughput technology and the decline in sequencing costs, multi-omics analysis has beco...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9421127/ https://www.ncbi.nlm.nih.gov/pubmed/36046233 http://dx.doi.org/10.3389/fgene.2022.921775 |
_version_ | 1784777527628136448 |
---|---|
author | Guo, Xinpeng Han, Jinyu Song, Yafei Yin, Zhilei Liu, Shuaichen Shang, Xuequn |
author_facet | Guo, Xinpeng Han, Jinyu Song, Yafei Yin, Zhilei Liu, Shuaichen Shang, Xuequn |
author_sort | Guo, Xinpeng |
collection | PubMed |
description | Motivation: A central goal of current biology is to establish a complete functional link between the genotype and phenotype, known as the so-called genotype–phenotype map. With the continuous development of high-throughput technology and the decline in sequencing costs, multi-omics analysis has become more widely employed. While this gives us new opportunities to uncover the correlation mechanisms between single-nucleotide polymorphism (SNP), genes, and phenotypes, multi-omics still faces certain challenges, specifically: 1) When the sample size is large enough, the number of omics types is often not large enough to meet the requirements of multi-omics analysis; 2) each omics’ internal correlations are often unclear, such as the correlation between genes in genomics; 3) when analyzing a large number of traits (p), the sample size (n) is often smaller than p, n << p, hindering the application of machine learning methods in the classification of disease outcomes. Results: To solve these issues with multi-omics and build a robust classification model, we propose a graph-embedded deep neural network (G-EDNN) based on expression quantitative trait loci (eQTL) data, which achieves sparse connectivity between network layers to prevent overfitting. The correlation within each omics is also considered such that the model more closely resembles biological reality. To verify the capabilities of this method, we conducted experimental analysis using the GSE28127 and GSE95496 data sets from the Gene Expression Omnibus (GEO) database, tested various neural network architectures, and used prior data for feature selection and graph embedding. Results show that the proposed method could achieve a high classification accuracy and easy-to-interpret feature selection. This method represents an extended application of genotype–phenotype association analysis in deep learning networks. |
format | Online Article Text |
id | pubmed-9421127 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-94211272022-08-30 Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype–phenotype interactions Guo, Xinpeng Han, Jinyu Song, Yafei Yin, Zhilei Liu, Shuaichen Shang, Xuequn Front Genet Genetics Motivation: A central goal of current biology is to establish a complete functional link between the genotype and phenotype, known as the so-called genotype–phenotype map. With the continuous development of high-throughput technology and the decline in sequencing costs, multi-omics analysis has become more widely employed. While this gives us new opportunities to uncover the correlation mechanisms between single-nucleotide polymorphism (SNP), genes, and phenotypes, multi-omics still faces certain challenges, specifically: 1) When the sample size is large enough, the number of omics types is often not large enough to meet the requirements of multi-omics analysis; 2) each omics’ internal correlations are often unclear, such as the correlation between genes in genomics; 3) when analyzing a large number of traits (p), the sample size (n) is often smaller than p, n << p, hindering the application of machine learning methods in the classification of disease outcomes. Results: To solve these issues with multi-omics and build a robust classification model, we propose a graph-embedded deep neural network (G-EDNN) based on expression quantitative trait loci (eQTL) data, which achieves sparse connectivity between network layers to prevent overfitting. The correlation within each omics is also considered such that the model more closely resembles biological reality. To verify the capabilities of this method, we conducted experimental analysis using the GSE28127 and GSE95496 data sets from the Gene Expression Omnibus (GEO) database, tested various neural network architectures, and used prior data for feature selection and graph embedding. Results show that the proposed method could achieve a high classification accuracy and easy-to-interpret feature selection. This method represents an extended application of genotype–phenotype association analysis in deep learning networks. Frontiers Media S.A. 2022-08-15 /pmc/articles/PMC9421127/ /pubmed/36046233 http://dx.doi.org/10.3389/fgene.2022.921775 Text en Copyright © 2022 Guo, Han, Song, Yin, Liu and Shang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Guo, Xinpeng Han, Jinyu Song, Yafei Yin, Zhilei Liu, Shuaichen Shang, Xuequn Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype–phenotype interactions |
title | Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype–phenotype interactions |
title_full | Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype–phenotype interactions |
title_fullStr | Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype–phenotype interactions |
title_full_unstemmed | Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype–phenotype interactions |
title_short | Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype–phenotype interactions |
title_sort | using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype–phenotype interactions |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9421127/ https://www.ncbi.nlm.nih.gov/pubmed/36046233 http://dx.doi.org/10.3389/fgene.2022.921775 |
work_keys_str_mv | AT guoxinpeng usingexpressionquantitativetraitlocidataandgraphembeddedneuralnetworkstouncovergenotypephenotypeinteractions AT hanjinyu usingexpressionquantitativetraitlocidataandgraphembeddedneuralnetworkstouncovergenotypephenotypeinteractions AT songyafei usingexpressionquantitativetraitlocidataandgraphembeddedneuralnetworkstouncovergenotypephenotypeinteractions AT yinzhilei usingexpressionquantitativetraitlocidataandgraphembeddedneuralnetworkstouncovergenotypephenotypeinteractions AT liushuaichen usingexpressionquantitativetraitlocidataandgraphembeddedneuralnetworkstouncovergenotypephenotypeinteractions AT shangxuequn usingexpressionquantitativetraitlocidataandgraphembeddedneuralnetworkstouncovergenotypephenotypeinteractions |