Cargando…
Identification of disease-associated loci using machine learning for genotype and network data integration
MOTIVATION: Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954643/ https://www.ncbi.nlm.nih.gov/pubmed/31070705 http://dx.doi.org/10.1093/bioinformatics/btz310 |
_version_ | 1783486838507307008 |
---|---|
author | Leal, Luis G David, Alessia Jarvelin, Marjo-Riita Sebert, Sylvain Männikkö, Minna Karhunen, Ville Seaby, Eleanor Hoggart, Clive Sternberg, Michael J E |
author_facet | Leal, Luis G David, Alessia Jarvelin, Marjo-Riita Sebert, Sylvain Männikkö, Minna Karhunen, Ville Seaby, Eleanor Hoggart, Clive Sternberg, Michael J E |
author_sort | Leal, Luis G |
collection | PubMed |
description | MOTIVATION: Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci. RESULTS: We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs. AVAILABILITY AND IMPLEMENTATION: An R package (cnmtf) is available at https://lgl15.github.io/cnmtf_web/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6954643 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-69546432020-01-16 Identification of disease-associated loci using machine learning for genotype and network data integration Leal, Luis G David, Alessia Jarvelin, Marjo-Riita Sebert, Sylvain Männikkö, Minna Karhunen, Ville Seaby, Eleanor Hoggart, Clive Sternberg, Michael J E Bioinformatics Original Papers MOTIVATION: Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci. RESULTS: We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs. AVAILABILITY AND IMPLEMENTATION: An R package (cnmtf) is available at https://lgl15.github.io/cnmtf_web/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-12-15 2019-05-09 /pmc/articles/PMC6954643/ /pubmed/31070705 http://dx.doi.org/10.1093/bioinformatics/btz310 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Leal, Luis G David, Alessia Jarvelin, Marjo-Riita Sebert, Sylvain Männikkö, Minna Karhunen, Ville Seaby, Eleanor Hoggart, Clive Sternberg, Michael J E Identification of disease-associated loci using machine learning for genotype and network data integration |
title | Identification of disease-associated loci using machine learning for genotype and network data integration |
title_full | Identification of disease-associated loci using machine learning for genotype and network data integration |
title_fullStr | Identification of disease-associated loci using machine learning for genotype and network data integration |
title_full_unstemmed | Identification of disease-associated loci using machine learning for genotype and network data integration |
title_short | Identification of disease-associated loci using machine learning for genotype and network data integration |
title_sort | identification of disease-associated loci using machine learning for genotype and network data integration |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954643/ https://www.ncbi.nlm.nih.gov/pubmed/31070705 http://dx.doi.org/10.1093/bioinformatics/btz310 |
work_keys_str_mv | AT lealluisg identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration AT davidalessia identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration AT jarvelinmarjoriita identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration AT sebertsylvain identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration AT mannikkominna identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration AT karhunenville identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration AT seabyeleanor identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration AT hoggartclive identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration AT sternbergmichaelje identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration |