Cargando…

Identification of disease-associated loci using machine learning for genotype and network data integration

MOTIVATION: Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a...

Descripción completa

Detalles Bibliográficos
Autores principales: Leal, Luis G, David, Alessia, Jarvelin, Marjo-Riita, Sebert, Sylvain, Männikkö, Minna, Karhunen, Ville, Seaby, Eleanor, Hoggart, Clive, Sternberg, Michael J E
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954643/
https://www.ncbi.nlm.nih.gov/pubmed/31070705
http://dx.doi.org/10.1093/bioinformatics/btz310
_version_ 1783486838507307008
author Leal, Luis G
David, Alessia
Jarvelin, Marjo-Riita
Sebert, Sylvain
Männikkö, Minna
Karhunen, Ville
Seaby, Eleanor
Hoggart, Clive
Sternberg, Michael J E
author_facet Leal, Luis G
David, Alessia
Jarvelin, Marjo-Riita
Sebert, Sylvain
Männikkö, Minna
Karhunen, Ville
Seaby, Eleanor
Hoggart, Clive
Sternberg, Michael J E
author_sort Leal, Luis G
collection PubMed
description MOTIVATION: Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci. RESULTS: We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs. AVAILABILITY AND IMPLEMENTATION: An R package (cnmtf) is available at https://lgl15.github.io/cnmtf_web/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6954643
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-69546432020-01-16 Identification of disease-associated loci using machine learning for genotype and network data integration Leal, Luis G David, Alessia Jarvelin, Marjo-Riita Sebert, Sylvain Männikkö, Minna Karhunen, Ville Seaby, Eleanor Hoggart, Clive Sternberg, Michael J E Bioinformatics Original Papers MOTIVATION: Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci. RESULTS: We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs. AVAILABILITY AND IMPLEMENTATION: An R package (cnmtf) is available at https://lgl15.github.io/cnmtf_web/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-12-15 2019-05-09 /pmc/articles/PMC6954643/ /pubmed/31070705 http://dx.doi.org/10.1093/bioinformatics/btz310 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Leal, Luis G
David, Alessia
Jarvelin, Marjo-Riita
Sebert, Sylvain
Männikkö, Minna
Karhunen, Ville
Seaby, Eleanor
Hoggart, Clive
Sternberg, Michael J E
Identification of disease-associated loci using machine learning for genotype and network data integration
title Identification of disease-associated loci using machine learning for genotype and network data integration
title_full Identification of disease-associated loci using machine learning for genotype and network data integration
title_fullStr Identification of disease-associated loci using machine learning for genotype and network data integration
title_full_unstemmed Identification of disease-associated loci using machine learning for genotype and network data integration
title_short Identification of disease-associated loci using machine learning for genotype and network data integration
title_sort identification of disease-associated loci using machine learning for genotype and network data integration
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954643/
https://www.ncbi.nlm.nih.gov/pubmed/31070705
http://dx.doi.org/10.1093/bioinformatics/btz310
work_keys_str_mv AT lealluisg identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration
AT davidalessia identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration
AT jarvelinmarjoriita identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration
AT sebertsylvain identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration
AT mannikkominna identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration
AT karhunenville identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration
AT seabyeleanor identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration
AT hoggartclive identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration
AT sternbergmichaelje identificationofdiseaseassociatedlociusingmachinelearningforgenotypeandnetworkdataintegration