Cargando…

Identifying genes targeted by disease-associated non-coding SNPs with a protein knowledge graph

Genome-wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) that play important roles in the genetic heritability of traits and diseases. With most of these SNPs located on the non-coding part of the genome, it is currently assumed that these SNPs influence the...

Descripción completa

Detalles Bibliográficos
Autores principales: Vlietstra, Wytze J., Vos, Rein, van Mulligen, Erik M., Jenster, Guido W., Kors, Jan A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9278741/
https://www.ncbi.nlm.nih.gov/pubmed/35830458
http://dx.doi.org/10.1371/journal.pone.0271395
_version_ 1784746249639952384
author Vlietstra, Wytze J.
Vos, Rein
van Mulligen, Erik M.
Jenster, Guido W.
Kors, Jan A.
author_facet Vlietstra, Wytze J.
Vos, Rein
van Mulligen, Erik M.
Jenster, Guido W.
Kors, Jan A.
author_sort Vlietstra, Wytze J.
collection PubMed
description Genome-wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) that play important roles in the genetic heritability of traits and diseases. With most of these SNPs located on the non-coding part of the genome, it is currently assumed that these SNPs influence the expression of nearby genes on the genome. However, identifying which genes are targeted by these disease-associated SNPs remains challenging. In the past, protein knowledge graphs have often been used to identify genes that are associated with disease, also referred to as “disease genes”. Here, we explore whether protein knowledge graphs can be used to identify genes that are targeted by disease-associated non-coding SNPs by testing and comparing the performance of six existing methods for a protein knowledge graph, four of which were developed for disease gene identification. We compare our performance against two baselines: (1) an existing state-of-the-art method that is based on guilt-by-association, and (2) the leading assumption that SNPs target the nearest gene on the genome. We test these methods with four reference sets, three of which were obtained by different means. Furthermore, we combine methods to investigate whether their combination improves performance. We find that protein knowledge graphs that include predicate information perform comparable to the current state of the art, achieving an area under the receiver operating characteristic curve (AUC) of 79.6% on average across all four reference sets. Protein knowledge graphs that lack predicate information perform comparable to our other baseline (genetic distance) which achieved an AUC of 75.7% across all four reference sets. Combining multiple methods improved performance to 84.9% AUC. We conclude that methods for a protein knowledge graph can be used to identify which genes are targeted by disease-associated non-coding SNPs.
format Online
Article
Text
id pubmed-9278741
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-92787412022-07-14 Identifying genes targeted by disease-associated non-coding SNPs with a protein knowledge graph Vlietstra, Wytze J. Vos, Rein van Mulligen, Erik M. Jenster, Guido W. Kors, Jan A. PLoS One Research Article Genome-wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) that play important roles in the genetic heritability of traits and diseases. With most of these SNPs located on the non-coding part of the genome, it is currently assumed that these SNPs influence the expression of nearby genes on the genome. However, identifying which genes are targeted by these disease-associated SNPs remains challenging. In the past, protein knowledge graphs have often been used to identify genes that are associated with disease, also referred to as “disease genes”. Here, we explore whether protein knowledge graphs can be used to identify genes that are targeted by disease-associated non-coding SNPs by testing and comparing the performance of six existing methods for a protein knowledge graph, four of which were developed for disease gene identification. We compare our performance against two baselines: (1) an existing state-of-the-art method that is based on guilt-by-association, and (2) the leading assumption that SNPs target the nearest gene on the genome. We test these methods with four reference sets, three of which were obtained by different means. Furthermore, we combine methods to investigate whether their combination improves performance. We find that protein knowledge graphs that include predicate information perform comparable to the current state of the art, achieving an area under the receiver operating characteristic curve (AUC) of 79.6% on average across all four reference sets. Protein knowledge graphs that lack predicate information perform comparable to our other baseline (genetic distance) which achieved an AUC of 75.7% across all four reference sets. Combining multiple methods improved performance to 84.9% AUC. We conclude that methods for a protein knowledge graph can be used to identify which genes are targeted by disease-associated non-coding SNPs. Public Library of Science 2022-07-13 /pmc/articles/PMC9278741/ /pubmed/35830458 http://dx.doi.org/10.1371/journal.pone.0271395 Text en © 2022 Vlietstra et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Vlietstra, Wytze J.
Vos, Rein
van Mulligen, Erik M.
Jenster, Guido W.
Kors, Jan A.
Identifying genes targeted by disease-associated non-coding SNPs with a protein knowledge graph
title Identifying genes targeted by disease-associated non-coding SNPs with a protein knowledge graph
title_full Identifying genes targeted by disease-associated non-coding SNPs with a protein knowledge graph
title_fullStr Identifying genes targeted by disease-associated non-coding SNPs with a protein knowledge graph
title_full_unstemmed Identifying genes targeted by disease-associated non-coding SNPs with a protein knowledge graph
title_short Identifying genes targeted by disease-associated non-coding SNPs with a protein knowledge graph
title_sort identifying genes targeted by disease-associated non-coding snps with a protein knowledge graph
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9278741/
https://www.ncbi.nlm.nih.gov/pubmed/35830458
http://dx.doi.org/10.1371/journal.pone.0271395
work_keys_str_mv AT vlietstrawytzej identifyinggenestargetedbydiseaseassociatednoncodingsnpswithaproteinknowledgegraph
AT vosrein identifyinggenestargetedbydiseaseassociatednoncodingsnpswithaproteinknowledgegraph
AT vanmulligenerikm identifyinggenestargetedbydiseaseassociatednoncodingsnpswithaproteinknowledgegraph
AT jensterguidow identifyinggenestargetedbydiseaseassociatednoncodingsnpswithaproteinknowledgegraph
AT korsjana identifyinggenestargetedbydiseaseassociatednoncodingsnpswithaproteinknowledgegraph